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Foreword 



“Despite the dotcom boom and bust, the computer and 
telecommunications revolution has barely begun. Over the next few 
decades, the Internet and related technologies really will profoundly 

transform society. 



By 2050 the Internet will have impacted our business, culture, and society as 
a whole as much if not more than did Gutenberg’s printing press 600 years 
ago in 1450. Sheer economics will force the majority of business and 
government interactions to be automated. Although the rate and extent of 
automation will vary by domain, most interactions will not only take place 
over the Web, they will be almost entirely free of human interaction. As with 
previous industrial revolutions, the profound impacts are unpredictable, 
especially the social, political, and religious impacts. However, the 
automation of everyday personal, commercial, and governmental activities is 
more easily predicted due to the potential economic benefits and the 
extrapolation of existing automation. The Third Industrial Revolution, the 
Information/Biotech Revolution, is well underway. 

Typically, there are multiple alternative technologies on which next- 
generation technologies might be built. Currently there are only two widely 
accepted enabling technologies that are both new, and hence are in their 
infancy, and mission critical. They are Web Services and the Web, or the 
next-generation Web, called the Semantic Web. To achieve even some of the 
promises for these technologies, we must develop vastly improved solutions 
for addressing the Grand Challenge of Information Technology, namely 
dealing better with semantics or real-world “meaning”. More precisely, we 
must enhance automated actions and data to more closely correspond to the 
real-world actions and facts that they represent, with minimal human 
involvement. This Grand Challenge is the core challenge not just of 
Information Technology but also of all next-generation automated 
applications. This challenge has been calling out for a Silver Bullet since the 
beginning of modem programming. 



1 David Manasian: Digital Dilemmas: A survey of the Internet, The Economist, January 25, 
2003. 




VI Foreword 



So what is a Silver Bullet? The ancient Greeks believed in the mystical 
power of silver as an infallible defense, means of attack, or solution to an 
otherwise insoluble problem. Germanic folklore of the Middle Ages held that 
only silver could slay man-eating werewolves. In a popular late-nineteenth- 
century English novel a silver bullet was the only means of killing the 
werewolf that plagued London. In a myth from my youth, the Lone Ranger 
TV series, based on 1930-40s novels, starred the Lone Ranger, a masked, 
clean, and heroic vigilante who came to the defense of many a prairie town by 
using a single silver bullet to slay the villain. The term Silver Bullet entered 
into the computing vernacular in 1987 2 when “Silver Bullet” was used 
pejoratively to dismiss the potential of a simple or single solution to 
longst andin g and otherwise invincible software engineering challenges. 

“Ontologies: A Silver Bullet for Knowledge Management and Electronic 
Commerce” provides a comprehensive introduction to the only known 
potential Silver Bullet for the Grand Challenge. That Silver Bullet is 
ontologies. An ontology, in the sense used in this book, is a community- 
mediated and accepted description of the kinds of entities that are in a domain 
of discourse and how they are related. They provide meaning, organization, 
taxonomy, agreement, common understanding, vocabulary, and a connection 
to the “real world”. For a given community, dealing with an agreed-upon 
domain (e.g., selling software over the Web), the ontological solution 
provides a definition of all required concepts and their relationships so that 
every program, Web service, or database that solves a problem in that domain 
can automatically communicate with other like entities based on the common 
definitions. Such solutions require concepts, languages, and tools, many still 
in their infancy. This volume gives a comprehensive introduction to 
ontologies in the context of the Semantic Web and Web Services challenges 
that lie at the heart of the Next Generation of computing. It describes and 
illustrates the basic concepts, languages, and tools currently available and in 
development. It illustrates these with knowledge management and electronic- 
commerce applications. One application, selling software over the Web, is 
based on UN/SPSC, an ontology that is accepted and used worldwide. Hence, 
the applications in this volume are not just speculative. They solve real 
problems. What is speculative is the adoption and development of ontological 
concepts, languages, and tools to extend such solutions to all domains. Unlike 
most technological solutions, ontologies start with human, community 
agreement on an ontology. Hence, ontologies are not solely a technical 
challenge. This is what you should expect of a technical solution that 
connects to the real world as ontologies do, by definition. 



2 Frederick P. Brooks: “No Silver Bullet - Essence and Accidents of Software Engineering”, 
IEEE Computer, 20(4): 10-19, April 1987. 




Foreword VII 



It remains to be seen whether ontologies will be the Silver Bullet for 
Knowledge Management and Electronic Commerce as this volume suggests 
or whether ontologies will be just another failed claim for a next-generation 
technology. To become versed in this, the Grand Challenge of Information 
Technology, and to understand the challenges and potential solutions that 
ontologies, and currently only ontologies, offer, you must understand the 
material offered comprehensively in this volume. The Third Industrial 
Revolution has begun and ontologies offer the hope of a Silver Bullet to 
overcome the Grand Challenge that stands in the way of its realization. 



White Stallion Ranch Communications Michael L. Brodie 

Tucson, AZ, USA Chief Scientist 

February, 2003 Verizon 
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1 Introduction 



Recently, ontologies have moved from a topic in philosophy to a topic in 
applied artificial intelligence that is at the center of modem computer science. 
Tim Berners-Lee, Director of the World Wide Web Consortium, referred to 
the future of the current WWW as the Semantic Web - an extended Web of 
machine-readable information and automated services that extend far beyond 
current capabilities. The explicit representation of the semantics underlying 
data, programs, pages, and other Web resources will enable a knowledge- 
based Web that provides a qualitatively new level of service. Automated 
services will improve in their capacity to assist humans in achieving their 
goals by “understanding” more of the content on the Web, and thus providing 
more accurate filtering, categorization, and searches of information sources. 
This process will ultimately lead to an extremely knowledgeable system that 
features various specialized reasoning services. These services will support 
us in nearly all aspects of our daily life - making access to information as 
pervasive, and necessary, as access to electricity is today. 

The backbone technology for this Semantic Web is ontologies. Ontologies 
provide a shared understanding of certain domains that can be communicated 
between people and application systems. Ontologies are formal structures 
supporting knowledge sharing and reuse. They can be used to represent 
explicitly the semantics of structured and semistructured information 
enabling sophisticated automatic support for acquiring, maintaining, and 
accessing information. As this is at the center of recent problems in 
knowledge management, enterprise application integration, and e-commerce, 
increasing interest in ontologies is not surprising. Therefore, a number of 
books have recently been published to cover this area. Examples are [Davies 
et al., 2003], [Fensel et al., 2002(a)], [Fensel et al., 2003], [Gomez Perez & 
Benjamins, 2002], and [Maedche, 2002]. However, these other publications 
are either collections of papers written by a diverse group of authors or they 
focus on a specific aspect of ontologies, for example, ontology learning. The 
book Ontologies: A Silver Bullet for Knowledge Management and Electronic 
Commerce is one of the few single-authored books that provide 
comprehensive and concise introductions to the field. The first edition had the 
merit of being the first book that introduced this area to a broader audience. 
Compared to the first edition, three major improvements have been made for 
the second edition: 
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• Many recent trends in languages, tools, and applications have been 
integrated and the material has been updated quite substantially, 
reflecting the dynamics of our area of interest. 

• The book is clearly structured into four sections: the concepts underlying 
ontologies; the languages used to define ontologies; the tool to work with 
ontologies; and the application areas of ontologies. 

• Many small mistakes have been eliminated from the text. 

Chapter 2 provides a definition of ontologies and illustrates various 
aspects of ontologies. Chapter 3 provides a survey of ontology languages, 
especially in the context of the Web and the Semantic Web. Chapter 4 
provides examples of all relevant aspects that arise when working with 
ontologies. Even commercial tool sets have become available and are 
described in this chapter. Finally, no technology without its applications. 
Chapter 5 discusses the application of ontologies in areas such as knowledge 
management, enterprise application integration, and e-commerce. 

All that remains is for me to wish the reader enjoyment and entertainment 
while reading about one of the most exciting areas of computer science today. 




2 Concept 



Ontologies were developed in artificial intelligence to facilitate knowledge 
sharing and reuse. Since the beginning of the nineties ontologies have 
become a popular research topic, investigated by several artificial intelligence 
research communities, including knowledge engineering, natural-language 
processing and knowledge representation. More recently, the notion of 
ontology is also becoming widespread in fields such as intelligent 
information integration, cooperative information systems, information 
retrieval, electronic commerce, and knowledge management. The growing 
popularity of ontologies is in a large part due to what they promise: a shared 
understanding of some domain that can be communicated between people 
and application systems. Currently computers are changing from single 
isolated devices to entry points into a worldwide network of information 
exchange and business transactions. Therefore support in the exchange of 
data, information, and knowledge is becoming the key issue in current 
computer technology. Providing shared domain structures is becoming 
essential, and ontologies will therefore become a key asset in describing the 
structure and semantics of information exchange. 

Ontologies have been developed to provide a machine-processable 
semantics of information sources that can be communicated between 
different agents (software and humans). Many definitions of ontologies have 
been given in the last decade, but one that, in our opinion, best characterizes 
the essence of an ontology is based on the related definitions in [Gruber, 
1993]: An ontology is a formal, explicit specification of a shared 
conceptualization. A “conceptualization” refers to an abstract model of some 
phenomenon in the world which identifies the relevant concepts of that 
phenomenon. “Explicit” means that the type of concepts used and the 
constraints on their use are explicitly defined. “Formal” refers to the fact that 
the ontology should be machine readable. Thus different degrees of formality 
are possible. Large ontologies like WordNet provide a thesaurus over 
100,000 natural language terms explained in natural language (see also 
[Meersman, 2000] for a discussion of this issue). At the other end of the 
spectrum is Cyc, which provides formal axiomizing theories for many aspects 
of common-sense knowledge. “Shared” reflects the notion that an ontology 
captures consensual knowledge, that is, it is not restricted to some individual, 
but accepted by a group. Basically, the role of ontologies in the knowledge 
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engineering process is to facilitate the construction of a domain model. An 
ontology provides a vocabulary of terms and relations with which to model 
the domain. Because ontologies aim at consensual domain knowledge their 
development is often a cooperative process involving different people, 
possibly at different locations. People who agree to accept an ontology are 
said to commit themselves to that ontology. 

Ontologies are introduced to facilitate knowledge sharing and reuse 
between various agents, regardless of whether they are human or artificial in 
nature. They are supposed to offer this service by providing a consensual and 
formal conceptualization of a certain area. In a nutshell, ontologies are formal 
and consensual specifications of conceptualizations that provide a shared 
understanding of a domain, an understanding that can be communicated 
across people and application systems. Thus, ontologies glue together two 
essential aspects that help to bring the Web to its full potential: 

• Ontologies define formal semantics for information, thus allowing 
information processing by a computer. 

• Ontologies define real-world semantics, which makes it possible to link 
machine-processable content with meaning for humans based on 
consensual terminologies. 

The latter aspect in particular is still far from having been studied to its 
full extent: how can ontologies be used to communicate real-world semantics 
between human and artificial agents? In answering this question we wish to 
point out two important features of ontologies: they must have a network 
architecture and they must be dynamic. 

Heterogeneity in space or ontology as networks of meaning. From the 
very beginning, heterogeneity has been an essential requirement for this 
ontology network. Tools for dealing with conflicting definitions and strong 
support in interweaving local theories are essential in order to make this 
technology workable and scalable. Islands of meaning must be interwoven to 
form more complex structures enabling exchange of information beyond 
domain, task, and sociological boundaries. This implies two tasks. First, tool 
support must be provided to define local domain models that express a 
commitment of a group of agents that share a certain domain and task and 
that can agree on a joint world-view for this purpose. Second, these local 
models must be interwoven with other models, such as the social practice of 
the agents that use ontologies to facilitate their communicational needs. Little 
work has been done in this latter area. We no longer talk about a single 
ontology, but rather about a network of ontologies. Links must be defined 
between these ontologies and this network must allow overlapping ontologies 
with conflicting - and even contradictory - conceptualizations. 
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Development in time or living ontologies. Originally, an ontology was 
intended to reflect the “truth” of a certain aspect of reality. It was the holy 
task of the philosopher to find such truth. Today, ontologies are used as a 
means of exchanging meaning between different agents. They can only 
provide this if they reflect an inter-subjective consensus. By definition, they 
can only be the result of a social process. This gives ontologies a dual status 
for the exchange of meaning: 

• Ontologies as prerequisite for consensus: Agents can only exchange 
meaning when they have already agreed on a joint body of meaning 
reflecting a consensual point of view on the world. 

• Ontologies as a result of consensus: Ontologies as consensual models of 
meaning can only arise as the result of a process where agents agree on a 
certain model of the world and its interpretation. 

Thus, ontologies are as much a prerequisite for consensus and information 
sharing as they are the results of them. An ontology is as much required for 
the exchange of meaning as the exchange of meaning may influence and 
modify an ontology. Consequently, evolving ontologies describe a process 
rather than a static model. Having protocols for the process of evolving 
ontologies is the real challenge. Evolving over time is an essential 
requirement for useful ontologies. As daily practice constantly changes, 
ontologies that mediate the information needs of these processes must have 
strong support in versioning and must be accompanied by process models 
that help to organize consensus. 

Depending on their level of generality, different types of ontologies may 
be identified that fulfill different roles in the process of building a 
knowledge-based systems ([Guarino, 1998], [van Heijst et al., 1997]). 
Among others, we can distinguish the following ontology types: 

• Domain ontologies capture the knowledge valid for a particular type of 
domain (e.g. electronic, medical, mechanic, digital domain). 

• Meta data ontologies like Dublin Core [Weibel et al., 1995] provide a 
vocabulary for describing the content of on-line information sources. 

• Generic or common-sense ontologies aim at capturing general 
knowledge about the world, providing basic notions and concepts for 
things like time, space, state, event, etc. ([Fridman-Noy & Hafner, 
1997]). As a consequence, they are valid across several domains. For 
example, an ontology about mereology (part-of relations) is applicable in 
many technical domains [Borst & Akkermans, 1997]. 
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• Representational ontologies do not commit themselves to any particular 
domain. Such ontologies provide representational entities without stating 
what should be represented. A well-known representational ontology is 
the Frame-Ontology [Gruber, 1993], which defines concepts such as 
frames, slots, and slot constraints allowing the expression of knowledge 
in an object-oriented or frame-based way. 

• Other types of ontology are so-called method and task ontologies 
([Fensel & Groenboom, 1997], [Studer et al., 1996]). Task ontologies 
provide terms specific for particular tasks (e.g. “hypothesis” belongs to 
the diagnosis task ontology), and method ontologies provide terms 
specific to particular problem-solving methods (e.g. “correct state” 
belongs to the propose-and-revise method ontology). Task and method 
ontologies provide a reasoning point of view on domain knowledge. 

In the following, we will discuss some illustrations: WordNet, Cyc, 
TOVE, and (KA) 2 . 

WordNet 1 (see [Fellbaum, 1999]) is an on-line lexical reference system 
whose design is inspired by current psycho linguistic theories of human 
lexical memory. English norms, verbs, adjectives and adverbs are organized 
into synonym sets, each representing one underlying lexical concept. 
Different relations link the synonym sets. It was developed by the Cognitive 
Science Laboratory at Princeton University. WordNet contains around 
100,000 word meanings organized in a taxonomy. WordNet groups words 
into five categories: noun, verb, adjective, adverb, and function word. Within 
each category it organizes the words by concepts (i.e., word meanings) and 
semantical relationship between words. Examples of these relationships are: 

• Synonymy: Similarity in meaning of words, which is used to build 
concepts represented by a set of words. 

• Antonymy : Dichotomy in meaning of words, mainly used for organizing 
adjectives and adverbs. 

• Hyponymy : Is-a relationship between concepts. This is-a hierarchy 
ensures the inheritance of properties from super-concepts to sub- 
concepts. 

• Meronymy: Part-of relationship between concepts. 

• Morphological: relations which are used to reduce word forms. 



i 



http://www.cogsci.princeton.edu/~wn 
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The success of WordNet is based on the fact that it is available on-line, 
free of charge, and that it is a dictionary based on concepts, i.e. it provides 
much more than just an alphabetic list of words. A multilingual European 
version Euro WordNet 2 has also come into being. Specific features of 
WordNet are its large size (i.e., number of concepts), its domain- 
independence, and its low level of formalization. By the latter I mean that 
WordNet does not provide any semantic definitions in a formal language. The 
semantics of concepts is defined with natural language terms. This leaves 
definitions vague and limits the possibility for automatic reasoning support. 
WordNet is mainly linguistically motivated. In this respect, WordNet can be 
seen as one extreme of a spectrum where Cyc defines the other extreme. 

Cyc 3 [Lenat & Guha, 1990] was initiated in the course of research into 
artificial intelligence, making common-sense knowledge accessible and 
processable for computer programs. The lack of common-sense knowledge 
and reasoning was encountered in many if not all application areas of 
artificial intelligence as the main barrier to enabling intelligence. Take 
machine learning as an example: on the one hand, learning is a prerequisite of 
intelligence; on the other hand, intelligence is a prerequisite for meaningful 
learning. Humans decide based on their common-sense knowledge what to 
learn and what not to learn from their observations. Cyc started as an 
approach to formalizing this knowledge of the world and providing it with a 
formal and executable semantics. Hundreds of thousands of concepts have 
since been formalized with millions of logical axioms, rules, and other 
assertions which specify constraints on the individual objects and classes. 
Some of them are publicly available on the Web page. The upper-level 
ontology of Cyc with 3000 concepts has been made publicly available. These 
are the most generic concepts which are situated at a high level in the 
taxonomy of concepts. Most of the more specific concepts are kept secret as 
property of Cycorp, the company that markets Cyc. 

Cyc groups concepts into microtheories to structure the overall ontology. 
Microtheories are a means to express the context dependency of knowledge 
(i.e., what is right in one context may be wrong in another one, see [Lenat, 
submitted]). They are a means to structure the whole knowledge base, which 
would be otherwise inconsistent and unmaintainable. Each microtheory is a 
logical theory introducing terms and defining their semantics with logical 
axioms. CycL, a variant of predicate logic, is used as language for expressing 
these theories. Like WordNet, Cyc is rather large and domain-independent. In 
contrast to WordNet it provides formal and operational definitions. 



2 http://www.illc.uva.nl/EuroWordNet 

3 http://www.cyc.com 
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TOVE 4 ([Fox et al., 1993], [Fox & Gruninger, 1997]) is an example of a 
task- and domain-specific ontology. The ontology supports enterprise 
integration, providing a sharable representation of knowledge. The goal of 
the TOVE (TOronto Virtual Enterprise) project is to create a generic, 
reusable data model that has the following characteristics: 

• it provides a shared terminology for the enterprise that each agent can 
jointly understand and use, 

• it defines the meaning of each term as precise and unambiguous as 
possible, 

• it implements the semantics in a set of axioms that will enable TOVE to 
automatically deduce the answer to many “common-sense” questions 
about the enterprise, and 

• it defines a symbology for depicting a term or the concept constructed 
thereof in a graphical context. 

In consequence, TOVE provides a reusable representation (i.e., ontology) 
of industrial concepts. Using ontologies for information exchange and 
business transactions is also investigated in [Uschold et al., 1996]. 

The Knowledge Annotation Initiative of the Knowledge Acquisition 
Community, known as (KA) 2 (see [Benjamins et al., 1999]), was a case study 
on: 

• the process of developing an ontology for a heterogeneous and world 
wide (research) community, and 

• the use of the ontology for providing semantic access to on-line 
information sources of this community. 

(KA) 2 comprises three main subtasks: (1) Ontological engineering to build 
an ontology of the subject matter; (2) characterizing the knowledge in terms 
of the ontology; and (3) providing intelligent access to the knowledge. In 
(KA) 2 , an ontology of the knowledge acquisition community (see an 
“enterprise knowledge map”) was built. Since an ontology should capture 
consensual knowledge, several researchers cooperated together - at different 
locations - to construct the ontology in (KA) 2 . In this way, it was ensured that 
the ontology will be accepted by a majority of knowledge acquisition 
researchers. The design criteria used to build this ontology were: modularity, 
to allow more flexibility and a variety of uses; specialization of general 
concepts into more specific concepts; classification of concepts to enable 
inheritance of common features; and standardized name conventions. The 
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2 Concept 



9 



ontology for the KA community consists of seven related ontologies: an 
organization ontology, a project ontology, a person ontology, a research-topic 
ontology, a publication ontology, an event ontology, and a research-product 
ontology. The first six ontologies are rather generic, while the seventh (i.e., 
the research-topic ontology) is specific to the investigated domain (see Fig. 
1). Actually, a meta-ontology (i.e., a template) for describing research topics 
was defined first. Then this template was instantiated for the research topics. 
The topics that were identified in a number of international meetings are: 
reuse; problem-solving methods; ontologies; validation and verification; 
specification languages; knowledge acquisition methodologies; agent- 
oriented approaches; knowledge acquisition from natural language; 
knowledge management; knowledge acquisition through machine learning; 
knowledge acquisition through Conceptual Graphs; foundations of 
knowledge acquisition; evaluation of knowledge acquisition techniques and 
methodologies; and knowledge elicitation. Each of these topics was given to 
a small group of experts who completed the scheme in Fig. 1 . 



Class: research-topic 

Attributes : 

Name: <string> 

Description: <text> 

Approaches: <set-of keyword> 

Research-groups: <set-of research-group> 
Researchers : <set-of researcher> 

Related-topics: <set-of research-topic> 
Subtopics: <set-of research-topic> 

Events: <set-of events> 

Journals: <set-of journal> 

Projects: <set-of project> 

Application-areas: <text> 

Products: <set-of product> 

Bibliographies: <set-of HTML-link> 

Mailing-lists: <set-of mailing-list> 

Webpages: <set-of HTML-link> 

International-funding-agencies : <funding-agency> 
National-funding-agencies : <funding-agency> 
Author-of-Ontology : <set-of researcher> 
Date-of-last-modif ication : <date> 

Fig. 1 The meta-ontology for specifying research topics in (KA) 2 
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Summary. An ontology provides an explicit conceptualization (i.e., meta- 
information) that describes the semantics of the data. It has a similar function 
to a database schema. The differences are 5 : 

• A language for defining ontologies is syntactically and semantically 
richer than common approaches for databases. 

• The information that is described by an ontology consists of 
semistructured natural language texts and not tabular information. 

• An ontology must be a shared and consensual terminology because it is 
used for information sharing and exchange. 

• An ontology provides a domain theory and not the structure of a data 
container. 

In a nutshell, ontology research is database research for the 21st century 
where data need to be shared and not always fit into a simple table. 



5 



See [Meersman, 2000] for an elaborated comparison of database schemes and ontologies. 




3 Languages 



This chapter is devoted to the language infrastructure that will enable 
ontologies to be put into practise. We have already mentioned the fact that 
computers are changing from single isolated devices to entry points into a 
worldwide network of information exchange and business transactions. 
Therefore, support for data, information, and knowledge exchange is 
becoming the key issue in computer technology. In consequence, strenuous 
efforts are being made towards a new standard for defining and exchanging 
data structures. The eXtendible Markup Language (XML) is a Web standard 
that provides such facilities. In this chapter we will therefore investigate 
XML in detail before going on to describe how it relates to the use of 
ontologies for information exchange. The Resource Description Framework 
(RDF) is a second important standard when talking about the Web and 
ontologies. Ontologies are formal theories about a certain domain of 
discourse and therefore require a formal logical language to express them. 
We will discuss some of the major formal approaches and we will investigate 
how recent Web standards such as XML and RDF relate to languages that 
express ontologies. 



3.1 XML 

XML is a tag-based language for describing tree structures with a linear 
syntax. It is a successor to the Standard Generalized Markup Language 
(SGML), which was developed long ago for describing document structures. 
However, whereas HTML is too simple to serve our purpose, SGML was 
seen to be too complex to become a widely used standard. XML simplifies 
some aspects of SGML that were not viewed as essential. An example of a 
simple XML document is provided in Fig. 2. 

XML provides seven different means for presenting information: 

1 Elements 

Elements are the typical element of a markup: <tag> contents </tag> 
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<?XML version="l . 0"?> 

<homepage> 

<heading>This is Fensel's Homepage ! </heading> 
<paragraph> 

Hello! My Name is 
<name>Dieter Fensel</name> 
and my email is:<br/> 

<email>dfe@aifb. uni-karlsruhe . de</email> 
and my phone number is:<br/> 

<phone type="of f ice">6084751</phone> 

<phone type =,, private n >98 62155</phone> 
</paragraph> 

</homepage> 

Fig. 2 A simple XML example 



2 Attributes 

Attributes are name-value pairs defined within a tag 

<tag attribute-name="attribute-value n ... ></tag> 

3 References 

References can be used to write symbols in a text that would otherwise be 
interpreted as commands, for example, “<” can be written as &it; for use 
as text in XML. References can also be used to define macros. Often-used 
text or links can be defined as macros and need to be written and 
maintained only at one place. Entity references always start with and 
end with ”. 



4 Comments 

Comments begin with <! — and end with — >. XML processors could 
ignore comments. 

5 Processing Instructions 

Processing Instructions (PI) are the procedural element in an otherwise 
declarative approach. Processing Instructions have the form: 

<?name pidata?> 

An XML processor could ignore Processing Instructions like comments, 
but must pass them through to the application. The application executes all 
Processing Instructions it knows. An example for a Processing Instruction 
is: 

<?xml : stylesheet type="text/css2 " href="style . css" ?> 
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6 CDATA 

CDATA represents arbitrary strings in XML Documents which are not 
interpreted by an XML parser. 

< ! [CDATA [ 

XML uses <begin-tag> and <end-tag> to structure documents. 

] ]> 

7 Prolog 

The XML declaration: <?xml version="i . 0"?> is obligatory. In addition 
a prolog may contain further elements. An XML document may use a 
document type declaration either by containing its definition or by 
pointing to it. Such a document type declaration defines a grammar for 
XML documents and is called a Document Type Definition (DTD). An 
external definition which is pointed to by a reference looks like this: 

< ! DOCTYPE Name SYSTEM "name.dtd"> 

while an internal definition looks like 

<! DOCTYPE Name [<! ELEMENT Name (#PCDATA)>]> 



3.1.1 What Are DTDs? 

In this section we will discuss the usefulness of DTDs and then show how 
DTDs can be defined. An XML document is well formed if 

• the document starts with an XML declaration; 

• all tags with contents have begin and end tags; tags without contents 
have an end tag or end with “/>”; 

• it has a root (XML documents are trees). 

An XML document is valid if it is well formed, and if the document uses a 
DTD it respects this DTD. Therefore, DTDs are not necessary for XML 
documents; however, they provide the ability to define stronger constraints 
for documents. 

A DTD consists of three elements: an element declaration that defines 
composed tags and value ranges for elementary tags; an attribute declaration 
that define attributes of tags; and an entity declaration. 
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<?XML version="l . 0"?> 

< ! DOCTYPE name [ 

<!ELEMENT name (title*, first name | initial, middle name?, 
last name +)>]> 

<! DOCTYPE first name [ 

<! ELEMENT first name #PCDATA 1 >]> 

<name> 

<title>Privatdozent</title> 

<title>Dr . </title> 

<first name>Dieter</f irst name> 

<last name>Fensel</last name> 

</name> 



1 parseable character data 



Fig. 3 A simple element declaration 



An example of element declarations and a valid XML document is given 
in Fig. 3. In this the following hold true: 

• “?” = zero or one appearance of an element 

• “*” = zero to n appearances of an element 

• “+” = one to n appearances of an element 

• “a i i?” = a or b appearances of an element 

Attribute declarations regulate the following aspects: the elements that 
may have an attribute; the attributes they have; the values an attribute may 
have; and the default value of an attribute. Its general form is: 

< ! ATTLIST element-name 

attribute-name! attribute-type! default-value! 
attribute-name n attribute-type n def ault-value n 

> 

There are six attribute types: cdata = string; id = Unique key; idref and 
idrefs = reference for one or several IDs in the document; entity or 
entities = name of one or several entities; nmtoken or nmtokens = value is 
one or several words; and a list of names (enumeration type). 
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Finally, four types of default values can be distinguished: 

• # required. The attribute must have a value. 

• # implied. The attribute must have a value and no default value is 
defined. 

• "value ". This value is the value of the attribute if nothing else is defined 
explicitly. 

• #fixed "value". If the attribute is used it must have this default value. 

Entities enable the definition of symbolic values. This may provide 
shortcuts for long expressions, for example dfe for Privatdozent Dr. Dieter 
Andreas Fensel. 

<! ENTITY dfe "Privatdozent Dr. Dieter Andreas Fensel "> 

Even more important, it significantly improves the maintainability of 
XML documents. Elements that appear in several places within a document 
need only be changed once based on their central description. 

XML Schema is another means of defining constraints on the syntax and 
structure of valid XML documents (see [Biron & Malhotra, 2000], 
[Thompson et al., 2000], [Walsh, 1999]). A more accessible explanation of 
XML Schema can be found in [Fallside, 2000]. XML schemas have the same 
purpose as DTDs, but provide several significant improvements: 

• XML schema definitions are themselves XML documents. 

• XML schema provides a rich set of data types that can be used to define 
the values of elementary tags. 

• XML schema provides a much richer means of defining nested tags (i.e., 
tags with sub-tags). 

• XML schema provides the name-space mechanism to combine XML 
documents with heterogeneous vocabulary. 

Compared to DTDs, XML schema provide various advantages; however, 
working with them and developing tools for them are more complex. 

3.1.2 Linking in XML 

HyperText Markup Language (HTML) provides representation of textual 
information and hyperlinks between various documents and parts hereof. 
XML incorporates a similar but generalized linking mechanism. In general, 
three kinds of links will be provided: simple links, extended links, and 
extended pointers. 
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Simple links resemble HTML links, for example, 

CLINK XML-LINK= "SIMPLE" H REF=" locator ">text</LINK> 

However, the locator may be an URL (as in HTML), a query (see XML- 
QL), or an extended pointer (see below). 

Extended links can express relations with more than two addressees: 

CELINK XML-LINK=" EXTENDED" ROLE= " ANNOTAT I ON S " > 

CLOCATOR XML- L I NK= " LOCATOR " HREF="text . loc">text 
</LOCATOR> 

CLOCATOR XML-LINK= "LOCATOR" HREF="Annot 1 . loc'^Anni 
C/LOCATOR> 

CLOCATOR XML- L I NK= " LOCATOR " HREF="Annot 2 . loc">Ann 2 
C/LOCATOR> 

C/ELINK> 

Extended Pointers (XPointers). In HTML an URL can point to a specific 
part of a document, for example, “http://www.a.b/c#name” where “name” is 
the name of an anchor tag. In XML you can jump to an arbitrary location 
within a document, for example, in a list of employees you can jump to the 
row of employees with the name Miller or to the 10th row or to the row with 
the ID “007”. Therefore, XPointers are similar to general queries. 



3.1.3 Extendible Style Language (XSL) 

A browser can render an HTML document because it recognizes all 
HTML tags. Therefore it can use predefined style information. However, in 
XML tags can be defined by the information provider. How does a browser 
render XML documents? It requires additional style sheet information for this 
purpose. Cascading Stylesheets (CSSs) define how a browser should render 
XML documents. It has already been developed for HTML to allow more 
flexibility in layout, helping to bring HTML back to its original purpose of 
being a language for describing the structure of documents instead of their 
layout. A more expressive choice in the case of XML is XSL. This is the 
upcoming standard for expressing format information of XML documents, 
but, it can do much more than this. CSS defines for each element of a 
document how it should be rendered. XSL allows us to define views that 
manipulate the structure and elements of an document before they are 
rendered. Therefore, XSL even enables the translation of one XML document 
into another using a different DTD. This is important in cases where different 
users may wish to have different views of the information captured in an 
XML document. In electronic commerce applications, it enables different 
product presentations for different clients and for different user groups (for 
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example, clients who build and maintain product catalogues versus users who 
access these catalogs). Therefore, XSL has more expressive power than CSS. 
It is comparable to the expressiveness of DSSSL which is used for presenting 
SGML documents. However, in contrast to the Lisp syntax of DSSSL, XSL 
has an XML syntax. 

At the moment, XSL can be used for server-site translation of XML 
documents into HTML. HTML is just another XML dialect and this 
translation by the server is required because most browsers currently do not 
support XML and XSL for rendering. In general, the dynamic manipulation 
of XML documents can be used to create different pages from the same data 
sources, and to realize dynamically changing pages according to user 
preferences or contexts. XML is a standard language for defining tagged 
languages. However, XML does not provide standard DTDs, i.e., each user 
can/may/must define his own DTD. For exchanging data between different 
users relying on different DTDs, you have to map different DTDs onto each 
other. You can use XSL to translate XML documents using DTD] into XML 
documents using DTD2 providing the translation service required for 
electronic commerce mentioned earlier. Precisely here lies the importance of 
XSL in our context. 

How does XSL achieve this? XSL is a language for expressing style- 
sheets. Each stylesheet describes rules for presenting a class of XML source 
documents. There are two parts to the presentation process: First, the result 
tree is constructed from the source tree. Second, the result tree is interpreted 
to produce formatted output on a display, on paper, in speech or on other 
media. 

The first part is achieved by associating patterns with templates. 

• A pattern is matched against elements in the source tree. 

• A template is instantiated to create part of the result tree. 

• The result tree is separate from the source tree. 

In consequence, the structure of the result tree can be completely different 
from the structure of the source tree. In constructing the result tree, the source 
tree can be filtered and reordered, and arbitrary structure can be added. 

The second part, formatting, is achieved by using the formatting 
vocabulary specified in this document to construct the result tree. 
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The following is an example of a simple XSL stylesheet that constructs a result tree 
for a sequence of paraelements. The result-ns="f o" attribute indicates that a tree 
using the formatting object vocabulary is being constructed. The rule for the root node 
specifies the use of a page sequence formatted with any font with serifs. The 
paraelements become block formatting objects which are set in 10 point type with a 
12 point space before each block. 



<xsl : stylesheet> 

xmlns : xsl="http: //www. w3 . org/TR/WD-xsl" 
xmlns : fo="http: //www.w3 . org/TR/WD-xsl/FO" 
result-ns= ,, fo"> 

<xsl : template match= n /"> 

<fo :basic-page-sequence font- family=" serif "> 
<xsl : apply-templates/> 

</fo :basic-page-sequence> 

</xsl : template> 

<xsl : template match= n para"> 

<fo:block font-size= ,, 10pt" space-bef ore="12pt"> 
<xsl : apply-templates/> 

</fo :block> 

</xsl : template> 

</xsl : stylesheet> 

Fig. 4 A simple XSL document 



• Formally, this vocabulary is an XML name space. Each element type in 
the vocabulary corresponds to a formatting object class. 

• A formatting object class represents a particular kind of formatting 
behavior. 

• Each attribute in the vocabulary corresponds to a formatting property. 

• A formatting object can have content, and its formatting behavior is 
applied to its content. 

An example of an XSL file is provided in Fig. 4. 

3.1.4 Additional Information 

There are large quantities of information available about XML. The official 
Web pages about XML are hosted by the W3C, 1 the standardization 
committee of the World Wide Web. 



i 



http://www.w3c.org 
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In addition, there is an excellent FAQ list at http://www.ucc.ie/xml and 
numerous books dealing with XML have appeared (e.g., [Connolly, 1997]). 
Articles and tutorials on XML can be found at: 

• http://metalab.unc.edu/pub/sim-info/standards/xml/why/xmlapps.html 

• http://www.inrialpes.fr/inria/seminaires/XML 1-10. 1 2.98/sld00000.htm 

• http://www.inrialpes.fr/inria/seminaires/XML2- 1 0. 1 2.98/sld00000.htm 

• http://www.gca.org/confrparis98/bosak/sld00000.htm 

• http://www.heise.de/ix/raven/Web/xml 

Finally, Robin Covers’ site at OASIS is one of the richest online sources 
on these topics: http://www.oasis-open.org/cover/xml.html. 



3.2 RDF 

XML provides semantic information as a by-product of defining the structure 
of the document. It prescribes a tree structure for documents and the different 
leaves of the tree have well-defined tags and contexts with which the 
information can be understood. That is, the structure and semantics of 
documents are interwoven. The Resource Description Framework 2 (see 
[Miller, 1998], [Lassila & Swick, 1999]) provides a means of adding 
semantics to a document without making any assumptions about its structure. 
RDF is an infrastructure that enables the encoding, exchange, and reuse of 
structured metadata. Search engines, intelligent agents, information brokers, 
browsers and human users can make use of semantic information. RDF is an 
XML application (i.e., its syntax is defined in XML) customized for adding 
meta-information to Web documents and will be used by other standards such 
as PICS-2, P3P, and DigSig (see Fig. 5). 

The RDF data model provides three object types: subjects, predicates, and 
objects (see the schema definition of RDF [Brickley et al., 1998]). 

• A subject is an entity that can be referred to by an address on the WWW 
(i.e., by an URL or URI). Resources are the elements that are described 
by RDF statements. 

• An predicate defines a binary relation between resources and/or atomic 
values provided by primitive data type definitions in XML. 

• An object specifies a value for a subject’s predicate. That is, objects 
provide the actual characterizations of the Web documents. 



2 http://www.w3c.org/Metadata 
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A simple example is 3 

Author (http : //www . cs . vu . nl/frankh) = Frank 

This states that the author of the named Web document is Frank. Values 
can also be structured entities: 

Author (http: //www. cs .vu.nl/frankh) = X 

Name (X) = Frank 

Email (X) = frankh@cs.vu.nl 

where X denotes an actual (i.e., the homepage of Frank) or a virtual URL In 
addition, RDF provides bags, sequence, and alternatives to express 
collections of Web sources. 

Finally, RDF can be used to make statements about RDF statements, i.e. it 
provides meta-level facilities: 

Claim (Dieter )= (Author (http : //www . cs . vu . nl/frankh) = Frank) 

states that Dieter claims that Frank is the author of the named resource. 

RDF schemes (RDFS) [Brickley et al., 1998] provide a basic type schema 
for RDF based on core classes, core property types and core constraints. 

Three core classes are provided by the RDF Schema machinery: 

• Resource (i.e., the class of all subjects); 

• Property Type (i.e., the class of all predicates); and 

• Class (i.e., the class of all values of predicates). 



XML- 

Applications 


DSig 


PICS 2 
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RDF 
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tions 


RDF 

Resource Description Framework 


XML 



Fig. 5 The Resource Description Framework 



3 I will skip the awkward syntax of RDF, because simple tooling can easily present it in a 
more common format such as shown here. 
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Core property types of RDFS are: 

• instanceOf and subClassOf. instanceOf defines a relationship between a 
resource and an element of Class, and subClassOf defines a relationship 
between two elements of Class. subClassOf is assumed to be transitive. 

• Constraint is a subclass of PropertyType. It has the two core instances 
range and domain applicable to property types having a class as value. 
Range and domain define the range and domain of property types 
respectively. 

XML, XSL, and RDF are complementary technological means that will 
enable ontological support in knowledge management and electronic 
commerce. XML provides a standard serial syntax for exchanging data. In 
consequence, ontology-based data and information exchange can abstract 
from these aspects. A DTD allows us to define the structure and elementary 
tags of an XML document. We will see later how such a DTD can be 
generated from an ontology and vice versa. XSL allows us to translate 
between different XML documents, i.e., documents relying on different 
DTDs. Finally, RDF provides a standard for describing machine-processable 
semantics of data. The relationships between new and forthcoming Web 
standards on the one hand and ontology languages on the other hand will be 
discussed later. 



3.3 Ontology Languages 

We will discuss some ontology languages that are well known in the 
community and that are prototypical of a specific language paradigm. These 
are: 



• CycL and KIF [Genesereth, 1991] as representatives of enriched first- 
order predicate logic languages. 

• Ontolingua [Farquhar et al., 1997] and Frame Logic [Kifer et al., 1995] 
as representatives of frame-based approaches. Both in-corporate frame- 
based modeling primitives in a first-order logical framework, but they 
apply very different strategies for this. 

• Description logics that describe knowledge in terms of concepts and role 
restrictions used to automatically derive classification taxonomies. 
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3.3.1 Predicate Logic Languages CycL And KIF 

CycL was developed in the Cyc project [Lenat & Guha, 1990] for the purpose 
of specifying the large common-sense ontology that should provide artificial 
intelligence to computers. Far from having attained this goal, Cyc still 
provides the world’s largest formalized ontology. CycL is a formal language 
whose syntax is derived from first-order predicate calculus. However, CycL 
extends first-order logic through the use of second- order concepts. Predicates 
are also treated as constants in expressions. The vocabulary of CycL consists 
of terms: semantic constants, non-atomic terms, variables, numbers, strings, 
etc. Terms are combined in CycL expressions, ultimately forming closed 
CycL sentences (with no free variables). A set of CycL sentences forms a 
knowledge base. In the following, we will discuss the main concepts of CycL. 
More details can be found on its homepage. 4 

Constants are the vocabulary of the CycL language; more precisely, they are 
the “words” that are used in writing the axioms (i.e., the closed formulas) that 
comprise the content of any CycL knowledge base. Constants may denote (1) 
individuals, (2) collections of other concepts (i.e., sets which correspond to 
unary predicates), or (3) arbitrary predicates that enable the expression of 
relationships among other constants and functions. Constants must have 
unique names. 

Predicates express relationships between terms. The type of each argument 
of each predicate must be specified; that is, the appropriate formulas must be 
asserted to be true, i.e. p(Aq) with Aq of type T and p{c) implies that c is an 
element of T. 

Variables stand for terms (e.g., constants) or formulas whose identities are 
not specified. A variable may appear anywhere that a term or formula can 
appear. 

Formulas combine terms into meaningful expressions. Each formula has the 
structure of a parenthesized list. That is, it starts with a left parenthesis, then it 
follows a series of objects which are commonly designated Aq, A\, A 2 , etc., 
and at the end there is a corresponding right parenthesis. The object in the Aq 
position may be a predicate, a logical connective, or a quantifier. The 
remaining arguments may be terms (e.g., constants, non-atomic terms, 
variables, numbers, strings delimited by double quotes (“...”)), or other 
formulas. Note the recursion (i.e., the second-order syntax) here; A x in one 
formula might itself be an entire CycL formula. Each atomic formula must 
begin with a predicate or a variable in order to be wellformed. The simplest 



http://www.cyc.com/cycl.html 
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kind of formula is in which the A 0 position is occupied by a predicate and all 
the other argument positions are filled with terms (or variables): 

(likesAsFriend DougLenat KeithGoolsbey) 

(colorOf Object ?CAR ?COLOR) 

The first formula above is called a ground atomic formula, since none of 
the terms filling the argument positions are variables. The second formula is 
not a ground atomic formula; it refers to the variables ?car and ? color. 

Logical connectives are used to build more complex formulas from atomic 
formulas (and/or other complex formulas). The most important logical 
connectives are and, or, and not. New connectives can be introduced simply 
by inserting a formula to that effect into the knowledge base; thus 

(isa new-connector Connective). 

Complex Formulas. We can compose the above connectives, of course, and 
have complex expressions such as 

(and . . . (or . . . (xor A (and ... ))...)...) 

Quantification comes in two main flavors: universal quantification and 
existential quantification. Universal quantification corresponds to 
expressions like every , all , always , everyone , and anything , while existential 
quantification corresponds to expressions like someone, something, and 
somewhere. CycL contains one universal quantifier, forAlU and four 
existential quantifiers thereExists , thereExistAtLeast , thereExistA tMost, and 
thereExistExactly. Additional quantifiers can be introduced by making the 
appropriate assertions - declaring the new quantifier to be an instance of 
Quantifier, and giving a definition of it, probably in terms of existing 
quantifiers, predicates, and collections. To be considered a closed sentence - 
a well-formed formula - all the variables in an expression need to be bound 
by a quantifier before they are used. 

Second-order Quantification. Quantification is also allowed over 
predicates, functions, arguments, and formulas. 

Functions. Like most predicates, most functions have a fixed arity. For each 
function assertions that specify the type of each argument must be entered 
into the CycL knowledge base. 
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Microtheories ([Lenat & Guha, 1990], [Guha, 1993]). A microtheory, or 
context ([McCarthy 1993], [Lenat, submitted]), is a set of formulas in the 
knowledge base. Each formula must be asserted in at least one micro-theory. 
Microtheories are fully reified objects, and thus they can not only contain 
CycL formulas, but also participate in CycL formulas. 

Each formula has an associated truth value (in each microtheory). CycL 
contains five possible non-numeric truth values, of which the most common 
are default true and monotonically true. The other truth values are default 
false, monotonically false, and unknown. In addition, CycL accommodates 
Bayesian probabilities and dependencies, and (separately) fuzzy truth values, 
attributes, and sets. All CycL-compliant systems must support at least one 
“true” and one “false” value. 

• Monotonically true means true with no exceptions. Assertions which 
are monotonically true are held to be true in every case, that is, for every 
possible set of bindings - not just currently known bindings - to the 
universally quantified variables (if any) in the assertion and cannot be 
overridden. 

• Assertions which are default true, in contrast to monotonically true, can 
have exceptions. They are held to be true only in most cases (usually 
meaning most of the relevant cases likely to be encountered in the 
current context) and can be overridden without needing to alert the user. 

In a nutshell, CycL uses predicate logic extended by typing (i.e. functions 
and predicates are typed), reification (i.e. predicates and formulas are treated 
as terms and can be used as expressions within other formulas), and 
microtheories that define a context for the truth of formulas. 

The Knowledge Interchange Format (KIF) [Genesereth & Fikes, 1992] is 
a language designed for use in the exchange of knowledge between disparate 
computer systems (created by different programmers at different time, in 
different languages, etc.). Different computer systems can interact with their 
users in whatever forms are most appropriate to their applications. Being a 
language for knowledge interchange, KIF can also be used as a language for 
expressing and exchanging ontologies. 5 The following categorical features 
are essential to the design of KIF. 

• The language has declarative semantics. 

• The language is logically comprehensive-at its most general it provides 
for the expression of arbitrary logical sentences. In this way, it differs 



5 Actually, KIF was not presented in this way because its origins are older than the current O- 
hip. 
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from relational database languages (like SQL) and logic programming 
languages (like Prolog). 

• The language provides a means for the representation of knowledge 
about knowledge. This allows the user to make knowledge re- 
presentation decisions explicit and to introduce new knowledge 
representation constructs without changing the language. 

Semantically, there are four categories of constants in KIF-object 
constants, function constants, relation constants, and logical constants. Object 
constants are used to denote individual objects. Function constants denote 
functions on those objects. Relation constants denote relations. Logical 
constants express conditions about the world and are either true or false. KIF 
is unusual among logical languages in that there is no syntactic distinction 
between these four types of constants; any constant can be used where any 
other constant can be used. The differences between these categories of 
constants are entirely semantic. This feature reifies second-order features in 
KIF. It is possible to make statements about statements. 

There are three disjoint types of expressions in the language: terms, 
sentences, and definitions. Terms are used to denote objects in the world 
being described, sentences are used to express facts about the world, and 
definitions are used to define constants. A knowledge base is a finite set of 
sentences and definitions. 

There are six types of sentences. 

sentence ::= constant | equation \ inequality | relsent | 
logsent | quantsent 

We have already mentioned constants. An equation consists of the “=” 
operator and two terms. An inequality consist of the “/=” operator and two 
terms. An implicit relational sentence consists of a constant and an arbitrary 
number of argument terms terminated by an optional sequence variable. 

The syntax of logical sentences depends on the logical operator involved. 
A sentence involving the not operator is called a negation. A sentence 
involving the and operator is called a conjunction and the arguments are 
called conjuncts. A sentence involving the or operator is called a disjunction 
and the arguments are called disjuncts. A sentence involving the “=>” 
operator is called an implication; all of its arguments but the last are called 
antecedents and the last argument is called the consequent. A sentence 
involving the “<=” operator is called a reverse implication; its first argument 
is called the consequent and the remaining arguments are called the 
antecedents. A sentence involving the “<=>” operator is called an 
equivalence. 
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There are two types of quantified sentences - a universally quantified 
sentence is signaled by the use of the forall operator, and an existentially 
quantified sentence is signaled by the use of the exists operator. The first 
argument in each case is a list of variable specifications. Note that according 
to these rules it is permissible to write sentences with free variables, 6 i.e. 
variables that do not occur within the scope of any enclosing quantifier. 

Finally, there are three types of definitions - unrestricted, complete, and 
partial. Within each type there are four cases, one for each category of 
constant. For more details see the KIF homepage. 7 

KIF and CycL have features in common. Both languages are oriented on 
predicate logics. Also, both provide an important extension of first-order 
logic. They allow the reification of formulas as terms used in other formulas. 
Therefore, KIF and CycL allow meta-level statements. In addition to this, 
CycL provides richer modeling primitives than KIF (e.g., various quantifiers 
and microtheories). This stems from the fact that CycL is a modeling 
language for ontologies whereas KIF was designed as an exchange format for 
ontologies. As I will discuss later, both languages are close in spirit to RDF. 
Second-order elements (i.e., formulas as terms in meta-level formulas) and 
global scope of properties (i.e., predicates) are common features. 



3.3.2 Frame-based Approaches: Ontolingua and Frame Logic 

The central modeling primitive of predicate logic are predicates. Frame- 
based and object-oriented approaches take a different point of view. Their 
central modeling primitive are classes (i.e., frames) with certain properties 
called attributes. These attributes do not have a global scope but are only 
applicable to the classes which they are defined for (they are typed), and the 
“same” attribute (i.e., the same attribute name) may be associated with 
different range and value restrictions when defined for different classes. In 
the following, we will discuss two frame-oriented approaches: Ontolingua 
(see [Gruber, 1993], [Farquhar et al., 1997]) and Frame Logic [Kifer et al., 
1995]. 

Ontolingua 8 was designed to support the design and specification of 
ontologies with a clear logical semantics based on KIF. Ontolingua extends 
KIF using additional syntax to include the intuitive bundling of axioms into 
definitional forms with ontological significance and a frame ontology to 



6 Very different from CycL, where free variables are forbidden. 

7 http://logic.stanford.edu/kif/kif.html 

8 http://ontolingua.stanford.edu 
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define object-oriented and frame-language terms. 9 The Frame-Ontology 
defines the set of KIF expressions that Ontolingua allows. This specifies the 
representation primitives that are often supported by special-purpose syntax 
and code in object-centered representation systems (e.g., classes, instances, 
slot constraints). Ontolingua definitions are Lisp-style forms that associate a 
symbol with an argument list, a documentation string, and a set of KIF 
sentences labeled by keywords. An Ontolingua ontology is made up of 
definitions of classes, relations, functions, distinguished objects, and axioms 
that relate these terms. 

A relation is defined with a form like the following: 

(def ine-relation name (?A 1 ?A 2 ) 

:def (KIF formula) 

The arguments ?a- and ?a 2 are universally quantified variables ranging 
over the items in the tuples of the relation. This example is a binary relation, 
so each tuple in the relation has two items. Relations of greater arity can also 
be defined. The sentence after the :def keyword is a KIF sentence stating 
logical constraints over the arguments. Constraints on the value of the first 
argument of a binary relation are domain restrictions, and those on the second 
argument of a binary relation are range restrictions. There may also be 
complex expressions stating relationships among the arguments of the 
relation. The :def constraints are necessary conditions, which must hold if 
the relation holds over some arguments. It is also possible to state sufficient 
conditions or any combination. 

A class is defined by a similar form with exactly one argument called the 
instance variable. In Ontolingua, classes are treated as unary relations to help 
unify object- and relation-centered representation styles. 

A function is defined like a relation. A slight variation in syntax moves 
the final argument outside of the argument list. As in definitions of relations, 
the arguments to a function are constrained with necessary conditions 
following the : def keyword. 

Finally, it is possible to define individuals in an ontology. 

The Frame-Ontology is expressed as second-order axioms in Ontolingua. 
It contains a complete axiomatization of classes and instances, slots and slot 
constraints, class and relation specialization, relation inverses, relation 
composition, and class partitions. Each second-order term is defined with 
KIF axioms. A list of the Frame-Ontology vocabulary is given in Figure 6. 



9 The Ontolingua server as described in [Farquhar et al., 1997] has extended the original 
language by providing explicit support for building ontological modules that can be assembled, 
extended, and refined in a new ontology. 
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class relation (?relation) 
class function (?function) 
class class (?class) 

relation instance-of (?individual ?class) 
function all-instances (?class) :-> 

?set-of- instances 

function one-of (@instances) : -> ?class 
relation subclass-of ( ?child-class ?parent-class ) 
relation superclass-of ( ?parent-class ?child- 
class ) 

relation subrelation-of 

( ? chi Id- relation ?pa rent-relation) 
relation direct-instance-of (?individual ?class) 
relation direct-subclass-of 

( ?child-class ?parent-class ) 
function arity (?relation) : -> ?n 
function exact-domain (?relation) : -> 

? domain- relation 

function exact-range (?relation) :-> ?class 
relation total-on (?relation ?domain-relation) 
relation onto (?relation ?range-class ) 
class n-ary-relation (?relation) 
class unary-relation (?relation) 
class binary-relation (?relation) 
class unary-function (?function) 
relation single-valued ( ?binary-relation) 
function inverse ( ?binary-relation) :-> ?relation 
function projection (?relation ?column) :-> ?class 
function composition 

(?relation-l ?relation-2) :-> ?binary-rela- 

tion 

relation composition-of 

( ?binary-relation ?1 is t-of -relations ) 
function compose 

@binary-relations ) :-> ?binary-relation 

relation alias (?relation-l ?relation-2) 
relation domain (Trelation ?class) 
relation domain-of 

( ?domain-class ?binary-relation) 
relation range (?relation ?class) 
relation range-of (?class ?relation) 
relation nth-domain 

(?relation ?integer ?domain-class ) 
relation has-value 

( ?domain-instance ?binary-relation ?value) 
function all-values ( ?domain-instance 

?binary-relation) :-> ?set-of-values 



relation value-type 

( ?domain-instance ?binary-relation ?class) 
function value-cardinality 

( ?domain-instance ?binary-relation) : -> ?n 
relation same-values 

( ?domain-instance ?relation-l ?relation-2) 
relation inherited-slot-value 

( ?domain-class ?binary-relation ?value) 
function all-inherited-slot-values 

( ?domain-class ?binary-relation) : -> ?set-of- 
values 

relation slot-value-type ( ?domain-class 
?binary-relation ?range-class ) 
function slot-cardinality 

( ?domain-class ?binary-relation) : -> ?n 
relation minimum-slot-cardinality 

( ?domain-class ?binary-relation ?n) 
relation maximum-slot-cardinality 

( ?domain-class ?binary-relation ?n) 
relation single-valued-slot 

( ?domain-class ?binary-relation) 
relation same-slot-values 

( ?domain-class ?relation-l ?relation-2) 
class class-partition ( ?set-of-classes ) 
relation subclass-partition (?c ?class-partition) 
relation exhaustive-subclass-partition 
(?c ?class-partition) 

relation asymmetric-relation ( ?binary-relation) 
relation antisymmetric-relation 
( ?binary- relation) 

relation antireflexive-relation ( ?binary-rela- 
tion) 

relation irref lexive-relation ( ?binary-relation) 
relation reflexive-relation ( ?binary-relation) 
relation symmetric-relation ( ?binary-relation) 
relation transitive-relation ( ?binary-relation) 
relation weak-transitive-relation 
( ?binary-relation) 

relation one-to-one-relation ( ?binary-relation) 
relation many-to-one-relation ( ?binary-relation) 
relation one-to-many-relation ( ?binary-relation) 
relation many-to-many-relation 
( ?binary-relation) 

relation equivalence-relation ( ?binary-relation) 
relation partial-order-relation ( ?binary-rela- 
tion) 

relation total-order-relation ( ?binary-relation) 
relation documentation (?object ?string) 



Fig. 6 The Frame-Ontology of Ontolingua (see [Gruber, 1993]) 



Frame Logic [Kifer et al., 1995] is a language for specifying object- 
oriented databases, frame systems, and logical programs. Its main 
achievement is to integrate conceptual modeling constructs (classes, 
attributes, domain and range restrictions, inheritance, axioms) into a coherent 
logical framework. Basically it provides classes, attributes with domain and 
range definitions, is-a hierarchies with set inclusion of subclasses and 
multiple attribute inheritance, and logical axioms that can be used to further 
characterize the relationships between elements of an ontology and its 
instances. 

The alphabet of an F-logic language consists of a set of function symbols 
and a set of variables. A term is a normal first-order term composed of 
function symbols and variables, as in predicate calculus. A language in F- 
logic consists of a set of formulas constructed from the alphabet symbols. As 
in many other logics, formulas are built from simpler formulas by using the 
usual connectives not, and, and or and the quantifiers forall and exists. The 
simplest kind of formulas are called molecular F-formulas. A molecule in F- 
logic is one of the following statements: 
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• Assertion of the form c : : d or of the form o : c, where C, D, and O 
are terms. The first expression models subclass relationship and the 
second statement models is-element-of relationship. 

• An object molecule of the form 

0[ a “;”-separated list of method expressions ] . 

A method expression can be either a data expression or a signature 
expression. O is a term denoting an object (which may refer to an 
instance or a class), “a” further specifies properties of this object. 

• A data expression can have one of the following two forms: 
scalar expression 

ScalarMethod @Qi,...,Qk -> T 
set-valued expression 

SetMethod 0 -» { S x , . . . , S m } 

Data expressions specify that the method m applied to the object O 
and the parameters return the value T. They can be either 

single-valued or may return a set. 

• A signature expression can also take two forms: 
scalar signature expression 

ScalarMethod 8V 1( ...,V„ => (Ai,...,h r ) 

set-valued signature expression 

SetMethod 0 W 1 ,...,W S =» (B lf ...,B t ) 

Signature expressions define types for applying methods (i.e., 
attributes) to objects. A method m applied to the object O and the 
parameter must return a value that is an element/subclass of 

A\ ,.»y4 r 

F-formulae are built of simpler F-formulae in the usual manner by means 
of logical connectives and quantifiers. 

Ontolingua and Frame Logic integrate frames (i.e., classes) into a logical 
framework. The main difference between Ontolingua and Frame Logic is the 
manner in which they realize frame-based modeling primitives in a logical 
language. Ontolingua characterizes the frame-based modeling primitives via 
axioms in the language. Frame Logic defines their semantics externally via an 
explicit definition of their semantics. To put it simply, Ontolingua applies 
standard semantics of predicate logic and uses axioms in this logic to exclude 
models that do not fit the semantics of its modeling primitives. Frame Logic 
provides a more complex semantics compared to predicate logic. The 
modeling primitives are explicitly defined in the semantics of Frame Logic. A 
second difference between Frame Logic and Ontolingua arises from the fact 
that Ontolingua inherits the powerful reification mechanism from KIF which 
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allows the use of formulas as terms of (meta-level) formulas. In Frame Logic, 
predicate names can be bound to variables but not entire formulas. 



3.3.3 Description Logics 

The main thrust of research in knowledge representation is directed at 
providing theories and systems for expressing structured knowledge and for 
accessing and reasoning with it in a principled way. Description logics (see 
[Brachman & Schmolze, 1985], [Baader et al., 1991]), also known as 
terminological logics, form an important powerful class of logic-based 
knowledge representation languages. 10 They stem from early work in 
semantic networks and define a formal and operational semantics for them. 
Description Logics try to find a fragment of first-order logic with high 
expressive power which still has a decidable and efficient inference 
procedure (see [Muslea et al., 1998]). Systems implemented include BACK, 
CLASSIC, CRACK, FLEX, K-REP, KL-ONE, KRIS, LOOM, and YAK. 11 

A distinguishing feature of description logics is that classes (usually called 
concepts) can be defined intensionally in terms of descriptions that specify 
the properties that objects must satisfy in order to belong to the concept. 
These descriptions are expressed using a language that allows the 
construction of composite descriptions, including restrictions on the binary 
relationships (usually called roles) connecting objects. 

Figure 7 provides the syntax definition of the core language of CLASSIC. 
Its main modeling primitives are concept expressions and individual 
expressions (see [Borgida et al., 1989]). A CLASSIC database is for the most 
part a repository of information about individual objects. Objects have an 
intrinsic identity and are related to each other through binary relationships; 
these are called roles (elsewhere known as attributes or properties). 
Individuals will be grouped into collections indirectly by means of 
descriptions that apply to all members of a collection. We will call these 
descriptions concepts or classes. The data definition language allows the 
definition of concepts either by grouping individuals together extensionally, 
or grouping individuals implicitly through the use of intensional descriptions 
in regard to their structure. Complex CLASSIC concepts are formed by 
composing expressions using a small set of constructors. 



10 Links to most papers, projects, and research events in this area can be found at http:// 
dl.kr.org. 

1 1 http://www.research.att.com/sw/tools/classic/imp-systems.html 
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The simplest kind of description you can form in CLASSIC is a primitive 
concept. Primitive concepts are simple but not necessarily atomic; each 
primitive concept, except for the topmost one (which we call THING), is 
expected to have at least one parent (more general) concept. The simplest 
kind of primitive is one whose only parent is essentially vacuous, namely 
THING. For example, the concept of a CAR might be defined in this way: 

(PRIMITIVE THING car ) 12 

This expression means that whatever it designates is simply a type of 
THING with some unspecified difference from THING in general. This is 
quite the opposite of the case with other (non-primitive) concepts, as we shall 
see in a moment. 



<concept-expr> : := 

THING | CLASSIC-THING | HOST-THING | 

[these three are built-in primitives] 

<concept-name> | 

( AND <concept-expr> + ) 1 | 

( ALL <role-expr> <concept-expr>) | 

( AT-LEAST <positive-integer> <role-expr>) | 

( AT-MOST <non-negative-integer><role-expr>) | 

( SAME-AS (<role-expr> + ) (<role-expr>+) ) | 

( TEST <fn> <realm>) | 

( ONE-OF <individual-name>+) | 

( PRIMITIVE <concept-expr> <index>) | 

( DISJOINT-PRIMITIVE 

<concept-expr> <partition-index> <index>) 
<individual-expr> : := 

<concept-expr> | 

( FILLS <role-expr> <individual-name>) | 

( CLOSE <role-expr>) | 

( AND <individual-expr>+) 

<realm> ::= host | classic 
<concept-name> : := <symbol> 

<individual-name> ::= <symbol> | <host-lang-expr> 

<role-expr> ::= <symbol> 

<index> ::= <number> | <symbol> 

<partition-index> : := <number> | <symbol> 

<fn> ::= a unary function with boolean return type that can be 
evaluated in the host language. 



1 “+” means one or more values separated by blanks. 

Fig. 7 The grammar of the CLASSIC language (taken from [Borgida et al., 1989]) 
12 This example is taken from [Borgida et al., 1989]. 
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Primitives can also have non-trivial parents. Thus, SPORTS-CAR might 
be defined as a subconcept of both CAR and another concept, EXPENSIVE- 
THING: 

(PRIMITIVE (AND CAR EXPENSIVE-THING) 
sports-car) . 

In fact, the parent of a primitive concept can be any CLASSIC concept, 
including another primitive. Primitives thus specify necessary conditions: if 
Corvette j is an instance of SPORTS-CAR, then it is both a CAR and an 
EXPENSIVE-THING. But note that there is no sufficiency condition 
specified for primitive concepts. 

The CLASSIC language of concepts allows us to go substantially beyond 
the simple is-a hierarchies of more traditional semantic data models. It offers 
three special ways of describing objects in terms of their structure. As we 
shall see, these constructors allow some class membership relations be 
determined by inference. CLASSIC'S three complex constructors are role 
value restrictions , cardinality bounds, and co-reference constraints. Role 
value restrictions are type constraints that hold for the fillers for some single 
role. 

Value restriction. For example, the concept expression 

(ALL thing-driven CAR) 

describes any object that is related by the thing-driven role solely to 
individuals describable by the concept CAR. 

Bounds restrict the number of fillers for roles. For example, 

(AT-MOST 4 thing-driven) 

describes any object that is related to at most 4 distinct individuals through 
the thing-driven role, while 

(AT-LEAST 3 wheel) 

describes any object that is related to at least 3 distinct individuals through 
the wheel role. 

Co-reference constraints specify simple equalities between single-valued 
roles or, more generally, chains of such roles. For example, the expression 

(SAME-AS (driver) (insurance payer) ) 

describes all those individuals whose filler for the driver role is the same as 
for the insurance payer role. 

Each of the constructors acts as part of both necessary and sufficient 
conditions for concepts in which they appear (as long as they are not used in a 
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primitive concept, in which case there are no sufficient conditions). 

It is important to note that the meaning of concepts in CLASSIC is 
determined by their structure. This implies that certain relationships exist 
between concepts by virtue of their definition. For example, it is quite 
possible for several different concept expressions to denote the same class: 

(AND (ALL thing-driven CAR) 

(ALL thing-driven EXPENSIVE-THING) ) 

is the same concept as 

(ALL thing-driven 

(AND CAR EXPENSIVE-THING) ) , 

Various studies have examined extensions of the expressive power of such 
a language and the trade-off in computational complexity for deriving is-a 
relationships between concepts and individuals in such a logic. Efficient 
implementations for core sets of primitives in these languages have been 
developed (see [Borgida & Patel-Schneider, 1994], [MacGregor, 1994], and 
[Horrocks & Patel-Schneider, 1999]); see, for example DLP 1 * and the FaCT 
system. 14 



3.4 XML, RDF, and Ontology Languages 

In this section we will examine how XML and RDF can be used to express 
ontologies. 



3.4.1 DTD and Ontologies 

On the one hand, ontologies and DTD/XML schemes serve very different 
purposes. Ontology languages are a means to specify domain theories and 
XML schemes are a means to provide integrity constraints for information 
sources (i.e., documents and/or semi-structured data). It is therefore not 
surprising to encounter differences when comparing XML schema with 
ontology languages. On the other hand, XML Schema and ontology 
languages have one main goal in common: both provide vocabulary and 
structure for describing information sources that are aimed at exchange. It is 
therefore legitimate to compare both and investigate their commonalities and 
differences. DTD and XML schema definitions define the legal nestings of 
tags and introduce attributes for them. Defining tags, their nesting, and 



13 http://www.bell-labs.com/user/pfps 
4 http://www.cs.man.ac.uk/~horrocks/software.html 
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attributes for tags may be seen as defining an ontology. However, there are 
significant differences between an ontology and a DTD. 

• First, a DTD specifies the legal lexical nesting in a document, which may 
or may not coincide with an ontological hierarchy (subclass 
relationship). That is, there is nothing in a DTD that corresponds to the 
is-a relationship of classes that is usually central in an ontology. 

• Second, and in consequence, DTDs lack any notion of inheritance. In an 
ontology, subclasses inherit attributes defined for their super-classes and 
superclasses inherit instances defined for their subclasses. These 
inheritance mechanisms do not exist for DTDs. 

• Third, DTDs provide a rather poor means for defining the semantics of 
elementary tags. Basically, a tag can be defined as being composed of 
other tags or being a string. Usually, ontologies provide a much richer 
typing concept for describing elementary types. 

• Fourth, DTDs define the order in which tags appear in a document. For 
ontologies, in contrast, the ordering of attribute descriptions does not 
matter. 

We will use an example to clarify these differences (see Fig. 8). 

• Concept Cj has two attributes, a\ and a 2 . This implies that the domains of 
«i and a 2 are the elements of Ci. The range of a\ is the intersection of c 2 
and C3 and the range of a 2 is the union of C4 and C5. 

• c 2 is defined as a subclass of ci. This implies that all attributes defined 
for ci are also applicable for c 2 . In addition, each element of c 2 is also an 

ELEMENT c x {c 1 .a 1 I c 1 .a 2 )* > 
ELEMENT c 2 (c 2 .a 1 \ c 2 .a 2 )* > 
ELEMENT c 3 (#PCDATA) *> 

ELEMENT c 4 (# PCDATA) *> 

ELEMENT c 5 (#PCDATA)*> 

ELEMENT c 6 ( c 6 .a 1 | c 6 .a 2 )* > 
ELEMENT c x . a 3 (c 2 , c 3 ) * > 
ELEMENT c x .a 2 (%c 4 I c 5 ) > 
ELEMENT c 2 .a 1 (c 2 , c 3 ) * > 
ELEMENT c 2 .a 2 (%c 4 | c 5 ) > 
ELEMENT c 6 .a 1 (c 2 , c 3 , %c 4 ) * > 
ELEMENT c 6 .a 2 (%c 4 , c 5 ) > 
ENTITY %c 3 ”c x I c 2 | c 6 "> 
ENTITY %c 4 "c 4 | c 6 "> 



element of c\. 



c 1 [a 1 ~> C 2 and c 3 
a 2 _> c 4 or ^5 ] 

c 2 < c x 

V < c 4 ■■mi 

c 6 < c x 
c 6 [a 1 -> c 4 ] 



< ! 
< ! 
< ! 
< ! 
< ! 
< ! 
< ! 
< ! 
< ! 
' < ! 
< ! 
< ! 
< ! 
< ! 



Fig. 8 Translation from an ontology into a DTD 
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• Finally c 6 is a subclass of C 4 and cj . Therefore it inherits the attributes a\ 
and aj from c j. In addition, it refines the range restriction of attribute a 1 
to c 2 and c 3 and c 4 . That is, the value of a j applied to an element of c 6 
must also be an element of c 4 . This is not necessary for an element of cj 
that is not also an element of c 6 . 

When translating this ontology into a DTD we first define c\ as an element 
having two sub tags C\.a\ and Cj.a 2 , i.e., 

<Cj> 

<c 1 .a 1 > ... </c-y.a x > 

<c 1 .a 2 > ... </c 1 .a 2 > 

</c 1 > 

would be a valid document. Therefore, we reify the attribute names with the 
concept names to distinguish different appearances of attributes in various 
concepts. A number of problems arise in this translation process: 

• The sequence of attribute values of an object does not matter in an 
ontology, i.e. o[ct\=5, o 2 =3] and o[a 2 =3, aj=5] are equivalent. We 
express this by (a, | a 2 )*. However, this implies that an object may have 
several values for the same attribute (which is allowed for set- valued but 
not for single-valued attributes). 

• The attribute oj has for c j as range the intersection of c 2 and c 3 . That is, a 
value of the attribute is an object for which the attributes of c 2 and c 3 can 
be applied. We express this via (c 2 , c 3 )* which again implies that an 
object may have several values for the same attribute. 

• The only primitive data type is PCDATA, i.e. arbitrary strings. 

We can also see the two aspects of inheritance in the translation process. 

• First, we have to add all inherited attributes and their inherited range 
restrictions explicitly. For example: 

<! ENTITY %c x "Ci | c 2 | c 6 "> 

• Second, the value of an attribute may also be the element of a subclass of 
its value type (i.e., a superclass inherits all elements of its subclasses). 
Therefore, whenever a class is used as a range restriction we have to add 
all its subclasses. For this we use the entity mechanism of DTDs. For 
example: 

<! ENTITY ic x " c 1 | c 2 | c 6 "> 

More details and further aspects of ontology to DTD translations can be 
found in [Erdmann & Studer, 1999] and [Rabarijoana et al., 1999]. In a 
nutshell, DTDs are rather weak in regard to what can be expressed with them. 
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Work on XML schemes (see [Malhotra & Maloney, 1999]) may well 
contribute to bridging the gap between DTDs and ontologies. Schemes 
introduce mechanisms for constraining document structure and content, 
mechanisms to enable inheritance for element, attribute, and data type 
definitions, mechanisms for application-specific constraints and descriptions, 
mechanisms to enable the integration of structural schemes with primitive 
data types, primitive data typing, including byte, date, integer, sequence, etc., 
and they allow the creation of user-defined data types. A detailed comparison 
of XML schemes and ontologies can be found in [Klein et al., 2003]. We will 
discuss only one quite interesting aspect that is related to the different 
treatment of inheritance in XML schema and in an ontology. 

XML Schema incorporates the notion of type derivation. However, this 
can only partially be compared with what is provided with inheritance in 
ontology languages. First, in XML Schema all inheritance has to be modeled 
explicitly. In ontologies, inheritance can be derived from the definitions of 
the concepts. Second, XML Schema does not provide a direct way to inherit 
from multiple parents. Types can only be derived from one base type. Most 
ontology languages provide multiple inheritance. Third, and very important, 
the is-a relationship has a twofold role in conceptual modeling which is not 
directly covered by XML Schema: 

• Top-down inheritance of attributes from superclasses to subclasses. 
Assume employee as a subclass of a class person. Then employee 
inherits all attributes that are defined for person. 

• Bottom-up inheritance of instances from subclasses to superclasses. 
Assume employee as a subclass of a class person. Then person inherits 
all instances (i.e., elements) that are an element of employee. 

In XML Schema, both aspects can only be modeled in an artificial way. 
The top-down inheritance of attributes is difficult to model, because type 
derivations in XML Schema can either extend or restrict the base type. A 
“dummy” intermediate type has to be used to model full top-down 
inheritance of attributes with both extending and restricting derivations. For 
example, it is not possible to model a student as a person with a student 
number and age < 40 in only one step. You first have to model a dummy type 
“young person”, which restricts the age of persons to less than 40. After that 
it is possible to model a student as a “young person” extended with a student 
number. 

The bottom-up inheritance of instances to superclasses is also not 
automatically available in XML Schema. For example, an instance of a 
student is not automatically a valid instance of a person, even if the student 
type inherits from the person type. 
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Up to now we have discussed the mapping from ontologies to DTDs. 
[Welty & Ide, 1999] discuss a mapping from DTDs to an ontological 
representation. Their aim is to provide the reasoning service of description 
logic to query and manipulate XML documents. DTDs are therefore 
translated automatically into a representation of an ontology in description 
logic. This ontology simply consists of each element in the DTD. The 
taxonomy can be derived by the classifier of the description logic CLASSIC 
based on the use of entities and type attributes. 



3.4.2 RDF And Ontologies 

RDF and RDFS can be used directly to describe an ontology. Objects, 
classes, and properties can be described. Predefined properties can be used to 
model instance of and subclass of relationships as well as domain restrictions 
and range restrictions of attributes. A speciality of RDFS is that properties 
are defined globally and are not encapsulated as attributes in class definitions. 
Therefore, a frame or object-oriented ontology can only be expressed in 
RDFS by reifying the property names with class name suffixes (as we have 
already seen for XML). In regard to ontologies, RDF provides two important 
contributions: 

• a standardized syntax for writing ontologies; 

• a standard set of modeling primitives like instance of and subclass of 
relationships. 

On the one hand, RDFS provides rather limited expressive power. A 
serious weakness of RDF is that it lacks a standard for describing logical 
axioms. RDFS allows the definition of classes and properties through their 
types (by providing their names). No intensional definitions or complex 
relationships via axioms can be defined. On the other hand, RDFS provides a 
rather strong reification mechanism. RDF expressions can be used as terms in 
meta-expressions. Here, RDFS provides reified second-order logic as used in 
CycL and KIF. Neither Frame Logic nor most description logics provide such 
an expressivity. 15 However, there are good reasons for this restriction in the 
latter approaches. This feature makes it very difficult to define a clean 
semantics in the framework of first-order logic and disables sound and 
complete inference services. 16 In the case of RDFS, such an inference service 
remains possible because of the otherwise restricted expressive power that 



15 Exceptions are described in [Calvanese et al., 1995] and [De Giacomo & Lenzerini, 1995], 
however, without an implemented reasoning service. 

16 The problems stem from the fact that, because terms in second-order logic may be arbitrary 
formulas, term unification in second-order logic (i.e., one simple sub step in deduction) 
requires full deduction in first-order logic which is undecidable in the general case. 
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does not provide any rule language. That is, RDFS provides syntactical 
features of second-order logic without actually requiring second-order 
semantics. 



3.4.3 Comparing RDF and XML 

RDF is an application of XML for the purpose of representing metadata. For 
example, the RDF statements: 

date (http : //www. xyz . de/Example/Smith/) = July 1999 
subject (http: //www. xyz.de/Example/Smith/) = Intelligent 
Agents 

creator (http: //www. xyz . de/Example/Smith/) = http:// 
www. xyz . de/~smith/ 

name (http : //www. xyz . de/~smith/) = John Smith 

email (http: //www. xyz .de/~smith/) = smith@organisation.de 

can be represented in linear XML syntax (see Fig. 9). This may raise the 
question why there is a need for RDF at all, because all metadata represented 
in RDF can also be represented in XML. However, RDF provides a standard 
form for representing metadata in XML. Directly using XML to represent 
metadata would result in its being represented in various ways. 

The difference becomes even more obvious when considering how to 
represent an ontology in RDF or XML. Earlier we discussed how an ontology 
can be used to generate a DTD describing the structure of XML documents. 

<? xml version=" 1 . 0" ?> 

<RDF 

xmlns="http : //www . w3c . org/1999/02/22-rdf-syntax-ns# " 

xmlns : DC="http: //www. purl . org/DC#/" 

xmlns : y="http : //www . description . org/schema"> 

description about="http : //www . xyz . de/Example/Smith/"> 

<DC : date rdf : resource=" July 1999"/> 

<DC: subject rdf : resource="Intelligent Agents"/> 

<DC : creator rdf : resource="http : //www. xyz . de/~smith/" /> 
</Description> 

description about="http : //www. xyz . de/~smith/"> 

<DC:name rdf : resource=" John Smith" /> 

<DC : email rdf : resource="smith@organisation . de" /> 
</Description> 

</RDF> 



Fig. 9 XML representation of RDF statements 
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However, we did not discuss how the ontology itself could be represented in 
XML. To define a standardized manner in which ontologies can be 
represented in XML we have to address two questions: 

• What are the epistemological primitives used to represent an ontology 
(i.e., things like classes, is-a relationships, element-of relationships, 
attributes, domain and range restrictions etc.)? Basically these are 
decisions about the meta-ontology used to represent ontologies. 

• How can these concepts be represented in the linear syntax of XML? 

There are a number of different possibilities, and this makes clear how 
RDFS comes into the story. RDFS provides a fixed set of modeling 
primitives for defining an ontology (classes, resources, properties, is-a and 
element-of relationships, etc.) and a standard way to encode them in XML. 
Using XML directly for the purpose of representing ontologies would require 
us to duplicate this standardization effort. 



3.5 New Standards 

Currently several proposals have been made to unify ontology and Web 
languages. We will conclude our discussion by briefly dealing with these new 
approaches. 

3.5.1 XOL 

The BioOntology Core Group 17 recommends the use of a frame-based 
language with an XML syntax for the exchange of ontologies for molecular 
biology. The proposed language is called XOL 18 (see [Karp et al., 1999], 
[McEntire et al., 1999]). The ontology definitions that XOL is designed to 
encode include both schema information (metadata), such as class definitions 
from object databases - as well as non-schema information (ground facts), 
such as object definitions from object databases. 

The syntax of XOL is based on XML. The modeling primitives and 
semantics of XOL are based on OKBC-Lite, which is a simplified form of the 
knowledge model for Open Knowledge Base Connectivity (OKBC) 19 
([Chaudhri et al., 1997], [Chaudhri et al., 1998]). OKBC is an application 



17 http://smi-web.stanford.edu/projects/bio-Ontology 
• * http://www.Ontologos.org/Ontology/XOL.htm 

9 http://www.ai.sri.com/~okbc. OKBC has also been chosen by FIPA as its exchange 
standard for ontologies; see http://www.fipa.org, FIPA 98 Specification, Part 12 Ontology 
Service (see [FIPA 98, part 12]). 
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<class> 

<name>person</name> 

</class> 

<slot> 

<name>age</name> 

<domain>person</domain> 

<value-type>integer</value-type> 

<numeric-max>150</numeric-max> 

</slot> 

<individual> 

<name>f red</name> 

< type>person</type> 

<slot-values> 

<name>age</name> 

<value>35</value> 

</slot-values> 

</individual> 

Fig. 10 An example in XOL (taken from [Karp et al., 1999]) 

program interface for accessing frame knowledge representation systems. Its 
knowledge model supports features most commonly found in knowledge 
representation systems, object databases, and relational databases. OKBC- 
Lite extracts most of the essential features of OKBC, but omits some of its 
more complex aspects. XOL was inspired by Ontolingua. It differs from 
Ontolingua, however, as it has an XML-based syntax rather than a Lisp-based 
syntax. 

The design of XOL deliberately uses a generic approach to define 
ontologies, meaning that the single set of XML tags defined for XOL 
(defined by a single XML DTD) can describe any and every ontology. This 
approach contrasts with the approaches taken by other XML schema 
languages, in which a generic set of tags is typically used to define the 
schema portion of the ontology and the schema itself is used to generate a 
second set of application-specific tags (and an application-specific DTD), 
which in turn are used to encode a separate XML file that contains the data 
portion of the ontology. Compare the XOL definitions in Figure 10. All of the 
XML elements of this specification (meaning all the words inside brackets), 
such as class, individual, and name, are generic, i.e., they pertain to all 
ontologies. All of the ontology-specific information is in the text portion of 
the XML file, i.e., between the pairs of elements. In contrast, approaches 
discussed earlier in this book might use this type of XML markup to define 
the individual Fred as shown in Figure 1 1 . 

What are the advantages of the generic approach taken by XOL relative to 
the non-generic approach? The primary advantage of the XOL approach is 
simplicity. Only one XML DTD need be defined to describe any and every 
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ontology. Using the non-generic approach, every ontology must define a 
second, ontology-specific, DTD for describing the data elements of the 
ontology. Furthermore, rules would have to be defined that describe exactly 
how that second DTD is derived from the schema portion of the ontology, 
and most likely, programs would have to be written to generate such DTDs 
from schema specifications. The XML language provides no formal 
machinery to define those rules. The entire DTD defining valid XOL 
documents is given in Figure 12. XOL appears interesting because it provides 
ontological modeling primitives expressed in one of the most important 
information exchange standards, XML. 



3.5.2 OIL 

OIL 20 (see [Fensel et al., 2001], [Fensel et al., 2000(b)]) unifies three 
important paradigms provided by different communities (see Fig. 13): formal 
semantics and efficient reasoning support as provided by description logics; 
epistemologically rich modeling primitives as provided by the frame 
community; and a standard proposal for syntactical exchange notation as 
provided by the Web community. 

Description Logics. Description Logics describe knowledge in terms of 
concepts and role restrictions that are used to automatically derive 
classification taxonomies. In spite of the discouraging theoretical complexity 
of their results, there are now efficient implementations for DL languages; 
see, for example, DLP and the FaCT system. OIL inherits from 
description logic its formal semantics and the efficient reasoning support 
developed for these languages. In OIL, subsumption is decidable and with 
FaCT we can provide an efficient reasoner for this. 

Frame-based systems. The central modeling primitives of predicate logic 
are predicates. Frame-based and object-oriented approaches take a different 
point of view. Their central modeling primitives are classes (i.e., frames) with 
certain properties called attributes. These attributes do not have a global 

<person> 

<name>f red</name> 

<age>35</age> 

</person> 

Fig. 11 Non-reusable ontology specification 



20 http://www.ontoknowledge.org/oil 

21 http://www.bell-labs.com/user/pfps 

22 http://www.cs.man.ac.uk/~horrocks/FaCT. Actually OIL uses FaCT as its inference engine. 
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< ! ELEMENT 

(module | ontology | kb | database | dataset) 

( name, (kb-type | db-type)?, package?, version?, 
documentation?, class*, slot*, individual* ) > 

< ! ELEMENT name (#PCDATA)> 

<! ELEMENT kb-type (# PCDATA) > 

< ! ELEMENT documentation (#PCDATA)> 

<! ELEMENT class 

( name, documentation?, 

(subclass-of | instance-of | slot-values)* )> 

<! ELEMENT slot 

( name, documentation?, 

(domain | slot-value-type | slot-inverse | slot-cardinality | 
slot-maximum-cardinality | 

slot-minimum-cardinality | slot-numeric-minimum | slot- 
numeric-maximum | slot-collection-type | slot-values)* )> 

< ! ATTLIST slot 

type ( template | own ) "own"> 

<! ELEMENT individual 

( name, documentation?, (type | slot-values)* )> 

<! ELEMENT slot-values 
( name, value*, 

(facet-values | value-type | inverse 

| cardinality | maximum-cardinality | minimum-cardinality 
| numeric-minimum | numeric-maximum | some-values 
| collection-type | documentation-in-frame)* )> 

<! ELEMENT facet-values 
( name, value* ) 

<! ELEMENT subclass-of (#PCDATA)> 

<! ELEMENT instance-of (#PCDATA)> 

< ! ELEMENT domain (#PCDATA)> 

<! ELEMENT slot-value-type (#PCDATA)> 

<! ELEMENT slot-inverse (#PCDATA)> 

<! ELEMENT slot-cardinality (#PCDATA)> 

<! ELEMENT slot-maximum-cardinality (#PCDATA)> 

<! ELEMENT slot-minimum-cardinality (#PCDATA)> 

<!ELEMENT slot-numeric-minimum (#PCDATA)> 

< ! ELEMENT slot-numeric-maximum (#PCDATA)> 

<! ELEMENT slot-collection-type (# PCDATA) > 

<! ELEMENT value- type (# PCDATA) > 

<! ELEMENT inverse (# PCDATA) > 

<! ELEMENT cardinality (#PCDATA)> 

<! ELEMENT maximum-cardinality (#PCDATA)> 

<! ELEMENT minimum-cardinality (#PCDATA)> 

< ! ELEMENT numeric-minimum (#PCDATA)> 

<! ELEMENT numeric-maximum (#PCDATA)> 

<! ELEMENT some-values (#PCDATA)> 



Fig. 12 The XOL DTD (see http://www.ontologos.org/Ontology/XOL.htm) 




3 Languages 43 



scope but are only applicable to the classes they are defined for (they are 
typed), and the “same” attribute (i.e., the same attribute name) may be 
associated with different range and value restrictions when defined for 
different classes. A frame provides a certain context for modeling one aspect 
of a domain. Many other additional refinements of these modeling constructs 
have been developed and have led to the great success of this modeling 
paradigm. Many frame-based systems and languages have been developed 
and, renamed as object-orientation, they have conquered the software 
engineering community. Therefore, OIL incorporates the essential modeling 
primitives of frame-based systems into its language. OIL is based on the 
notion of a concept and the definition of its superclasses and attributes. 
Relations can also be defined not only as an attribute of a class but also as an 
independent entity having a certain domain and range. Like classes, relations 
can be arranged in a hierarchy. 

Web standards: XML and RDF. Modeling primitives and their semantics 
are one aspect of an ontology language. Next, we have to decide about its 
syntax. Given the current dominance and importance of the WWW, a syntax 
of an ontology exchange language must be formulated using existing Web 
standards for information representation. OIL is closely related to XOL and 
can be seen as an extension of it. For example, XOL only allows necessary 
but not sufficient class definitions (i.e., a new class is always a subclass of, 
and not exactly equal to, its specification) and only class names, but not class 
expressions (except for the limited form of expression provided by slots and 
their facets), can be used in defining classes. The XML syntax of OIL was 
mainly defined as an extension of XOL. Another candidate for a Web-based 
syntax for OIL is RDF together with RDFS. In regard to ontologies, RDFS 

Description logics: Frame-based systems: 

Formal semantics and Epistemological modeling 

reasoning support primitives 



OIL 

4 



Web languages: 

XML- and RDF-based syntax 



Fig. 13 The three roots of OIL 
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provides two important contributions: a standardized syntax for writing 
ontologies, and a standard set of modeling primitives like instance-of and 
subclass-of relationships. Therefore, OIL offers two syntactical variants: one 
based on XML schema and one based on RDF schema. 



3.5.3 DAML+OIL 

DAML+OIL builds on work from the OIL initiatives (see [McGuinness et 
al., 2002]). It provides modeling primitives commonly found in frame-based 
languages (such as an asserted subsumption hierarchy and the description or 
definition of classes through slot fillers) and has a clean and well-defined 
semantics. DAML+OIL is effectively an alternative presentation syntax for a 
description logic (SHIQ with the addition of concrete data types) with an 
underlying RDFS-based delivery mechanism. The presence of the well- 
defined semantics in terms of SHIQ allow the use of description logic 
reasoners such as FaCT 24 or RACER, 25 in particular to support the tasks of 
classification and inconsistency detection. The development of DAML+OIL 
was the responsibility of the Joint US/EU ad hoc Agent Markup Language 
Committee. 26 Many members of that committee are now part of the WebOnt 
Committee which we will explain in the next section. 



3.5.4 The Web Ontology Language OWL 

The W3C Web Ontology Working Group, 27 part of the W3C’s Semantic 
Web Activity, 28 has focused on the development of a language to extend the 
semantic reach of current XML and RDF metadata efforts. The working 
group builds the Ontological layer necessary for developing applications that 
depend on an understanding of logical content, not just human-readable 
presentation, and the formal underpinnings thereof. Specifically, the Web 
Ontology Working Group is chartered to design a Web Ontology language, 
which builds on current Web languages that allow the specification of classes 
and subclasses, properties and subproperties (such as RDFS), but which 
extends these constructs to allow more complex relationships between 
entities including: the means to limit the properties of classes with respect to 
number and type; the means to infer that items with various properties are 



23 http://www.daml.org/200 1 /03/daml+oil-index.html 

24 http://www.cs.man.ac.uk/fact 

25 http://kogs-www.informatik.uni-hamburg.de/~race 

26 http://www.daml.org/committee 

27 http://www.w3.org/2001/sw/WebOnt 

28 http://www.w3.org/2001/sw 
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members of a particular class; a well-defined model of property inheritance; 
and similar semantic extensions to the base languages. 

The Web Ontology Language (OWL) Reference Draft [Dean et al., 2002] 
provides the OWL Web Ontology Language specification. OWL is derived 
from the DAML+OIL language and builds upon RDFS. Because this 
language is still under development, we refer the interested reader to the 
websites of the W3C: http://www.w3.org/2001/swAVebOnt. 

When comparing OIL and OWL the following main aspects are apparent: 

• OIL had two syntactical definitions: one in plain XML (based on XML 
schema) and one in RDFS. The reason was that OIL did not want to 
completely subscribe to the RDFS world, leaving the much larger XML 
world behind. OWL subscribes to RDFS only. 

• Both OIL and OWL are layered languages. OWL-Lite [McGuinness & 
van Harmelen, 2002] defines a subset of OWL. Unfortunately this subset 
is not defined in a well-thought manner. One would expect that the 
simple sublanguage does not require a description logic type of reasoner. 
However, the fact that cardinality constraints can be used to derive 
equalities of terms requires DL-based reasoning to decide whether two 
terms are equal (instead of simple term unification in standard logical 
reasoning). On the one hand, this makes even the simple sublanguage 
complex to deal with. On the other hand, it does not add anything to its 
usage. It is bizarre to model equality of terms via cardinality constraints 
and one needs much richer means to express equality of terms anyway 
than cardinality constraints or factual equality statements. From our 
point of view it would have been much wiser to assume non-equality of 
terms (i.e., the unique-name assumption of the database area) and 
provide a much more powerful oracle beyond the ontology language that 
normalizes different terms that denote the same thing. 

• OIL and OWL are based on description logic, however, it has never been 
proven that it is the appropriate logical paradigm for the Semantic Web. 
OIL combined a description logic with a frame-based orientation. OWL 
is more an RDF(S) syntax for a description logic. 

XOL, OIL, DAML+OIL, and OWL represent different points in the 
coordination system of Web ontology languages. Table 1 summarizes our 
comparison. In a sense, only OWL has any chance of survival. However, 
future usage of OWL may require adaptations that will be easier if some of 
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the different design decisions are kept in mind. 
Table 1. Summarizing language features 
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This chapter describes tools that help us to work with ontologies and to apply 
them to improve information access. We start with a general survey on 
various aspects of ontology tooling and give examples. Then we describe a 
tool that was among the first to merge the ontology paradigm with the Web, 
helping to create the research field that is now called the Semantic Web. Its 
description gives us an example that illustrates the different requirements to 
make ontology technology work. Then we discuss some professional tools 
now available on the market. 



4.1 A Survey 

Effective and efficient work with ontologies must be supported by advanced 
tools enabling the full power of this technology. In particular, we need the 
following elements: 

• Ontology languages to express and represent ontologies. 

• Ontology editors and semi-automatic ontology construction to build new 
ontologies. 

• Reuse and merging of ontologies, i.e., ontology environments that help 
to create new ontologies by reusing existing ones. 

• Reasoning with ontologies: Instance and schema inferences to enable 
advanced query answering services, support ontology creation and help 
to map between different ontologies. 

• Ontology-based annotation tools to enable unstructured and 
semistructured information sources to be linked with ontologies. 

• Ontology-based tools for information access and navigation to enable 
intelligent information access for human users. 

We will now discuss these elements in some details. 
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4.1.1 Ontology Languages 

Ontology languages must fulfill three important requirements: 

• They must be highly intuitive to the human user. Given the current 
success of the frame-based and object-oriented modeling paradigm, they 
should have a frame-like look and feel. 

• They must have a well-defined formal semantics with established 
reasoning properties in terms of completeness, correctness, and 
efficiency. 

• They must have a proper link with existing Web languages like XML 
and RDF, ensuring interoperability. 

A more detailed discussion of these aspects was provided in the chapter 3. 



4.1.2 Ontology Editors and Semi-automatic Ontology Construction 

Ontology editors help human knowledge engineers to build ontologies. 
Ontology editors support the definition of concept hierarchies, the definition 
attributes for concepts, and the definition of axioms and constraints. They 
must provide graphical interfaces and must conform to existing standards in 
Web-based software development. They enable inspection, browsing, 
codification and modification of ontologies and thus support their 
development and maintenance. Examples are: 

• Protege 1 (see [Grosso et al., 1999]) and Protege-2000 (see [Puerta et al., 
1992], [Erikson et al., 1999]) are versions of a series of tools developed 
by the Knowledge Modeling Group at Stanford Medical Informatics to 
assist developers in the construction of large electronic knowledge bases 
(see Figure 14). Protege allows developers to create, browse and edit 
domain ontologies in a frame-based representation, which is compliant 
with the OKBC knowledge model [Chaudhri et al., 1998]. Starting with 
an ontology, Protege automatically constructs a graphical knowledge- 
acquisition tool that allows application specialists to enter the detailed 
content knowledge required to define specific applications. Protege 
allows developers to customize this knowledge-acquisition tool directly 
by arranging and configuring the graphical entities in forms that are 
attached to each class in the ontology for the acquisition of instances. 
This allows application specialists to enter domain information by filling 



http ://www. smi . Stanford . edu/proj ects/protege 
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Fig. 14 Protege 



in the blanks of intuitive forms and by drawing diagrams composed of 
selectable icons and connectors. Protege-2000 allows knowledge bases 
to be stored in several formats, among others RDF. 

• OntoEdit 2 [Sure et al., 2002] is an ontology engineering environment 
developed at the Knowledge Management Group of the University of 
Karlsruhe. Currently OntoEdit supports representation languages such as 
F-logic, OIL, and RDFS. It is marketed by Ontoprise 3 . 

• WebOnto 4 is a Java applet coupled with a customized Web server which 
allows users to browse and edit ontologies over the Web. The resulting 
Planet-Onto architecture provides an integrated set of tools to support 
news publishing, ontology-driven document formalization, story 
identification and personalized news feeds and alerts. It was developed at 
the Knowledge Media Institute of the Open University in Milton Keynes. 



2 http://ontoserver.aifb.uni-karlsruhe.de/ontoedit 

3 http://www.ontoprise.de 

4 http://kmi.open.ac.uk/projects/webonto 
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• OilEd 5 is a simple editor that allows the user to create and edit OIL 
ontologies. The main intention behind OilEd is to provide a simple, 
freeware editor that demonstrates the use of, and stimulates interest in, 
DAML+OIL. OilEd is not intended as a full ontology development 
environment - it will not actively support the development of large-scale 
ontologies, the migration and integration of ontologies, versioning, 
argumentation and many other activities that are involved in ontology 
construction. It should, however, provide enough to allow the basic 
construction of OIL ontologies and demonstrate the power of the 
connection to the FaCT reasoner. Both tools were developed at the 
University of Manchester, UK. 

• ODE (Ontology Design Environment) 6 is a software tool for specifying 
ontologies at a high conceptual level. ODE allows developers to specify 
their ontology by filling in tables and drawing graphs. Its multilingual 
generator module automatically translates the specification of the 
ontology into target languages. It was developed at the University of 
Madrid. 

• Many authors recommend UML as a representation language (see, for 
example, [Cranefield & Purvis, 1999]) for ontologies. The advantage of 
using UML development environments is that standard software 
development tools can be used to build ontologies. A disadvantage is that 
such tools may provide less support than customized special-purpose 
ontology editors. 

Manually building ontologies is a time-consuming task. It is very difficult 
and cumbersome to manually derive ontologies from data. This appears to be 
true even regardless of the type of data one might consider. Natural language 
texts exhibit morphological, syntactic, semantic, pragmatic and conceptual 
constraints that interact in order to convey a particular meaning to the reader. 
Thus, the text transports information to the reader and the reader embeds this 
information into his background knowledge. Tools that leam ontologies from 
natural language exploit the interacting constraints on the various language 
levels (from morphology to pragmatics and background knowledge) in order 
to discover new concepts and stipulate relationships between concepts. 
Therefore, in addition to editor support, semi-automated tools in ontology 
development help to improve the overall productivity. These tools combine 
machine learning, information extraction and linguistic techniques. The main 
tasks are: 



5 http://oiled.man.ac.uk 

6 http://delicias.dia.fi.upm.es/miembros/ASUN/asun_CV_Esp.html 
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• extraction of relevant concepts; 

• building is-a hierarchies; 

• extraction relationships between concepts. 

A review of the state of the art on ontology learning was presented at an 

n 

ECAI workshop in Berlin during August 2000. Example systems are: 

• Asium , which stands for “Acquisition of Semantic knowledge Using 
Machine learning method”. The main aim of Asium is to help the expert 
in the acquisition of semantic knowledge from texts and to generalize the 
knowledge of the corpus. Asium provides the expert with a powerful and 
user-friendly interface which will first help him or her to explore the 
texts and then to learn knowledge which is not in the texts. 

• The Text-To-Onto system , which provides an integrated environment for 
the task of learning ontologies learning from text (see Figure 15). The 
Text Management module enables a relevant corpus to be selected. 
These domain texts may be both natural language texts and HTML 
formatted texts. For a meaningful analysis text has to be preprocessed. 
The Text Management module serves as an interface to the Information 
Extraction Server. If a domain lexicon already exists, the information 
extraction server performs domain specific parsing. The results of the 
parsing process are stored in XML or feature- value structures. The 
Management Module offers all existing learning components to the user. 
Typically these components are parametrizable. Existing knowledge 
structures (e.g., a taxonomy of concepts) are incorporated as background 
knowledge. The learning component discovers on the basis of the 
domain texts new knowledge structures, which are collected in the 
ontology modeling module to expand the existing ontology. Text-To- 
Onto was developed by the Knowledge Management Group of 
University of Karlsruhe. 

4.1.3 Reusing and Merging Ontologies: Ontology Environments 

Assuming that the world is full of well-designed modular ontologies, 
constructing a new ontology is a matter of assembling existing ones. Instead 
of building ontologies from scratch one wants to reuse existing ontologies. 
Tools that best support this approach allow for the adaptation and merging of 
existing ontologies to fit them for new tasks and domains. The knowledge 
engineer needs support in merging multiple ontologies and diagnosing 



7 



http://ol2000.aifb.uni-karlsmhe.de 
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Fig. 15 Ontotext 

individual or multiple ontologies. He requires support in such tasks as using 
ontologies in differing formats, reorganizing taxonomies, resolving name 
conflicts, browsing ontologies, editing terms, etc. Possible solution providers 
for ontology environments are the following: 

• [Farquhar et al., 1997] describes the Ontolingua server, which provides 
different kinds of operations for combining ontologies: inclusion; 
restriction; and polymorphic refinement. For example, inclusion of one 
ontology in another has the effect that the resulting ontology consists of 
the union of the two ontologies (their classes, relations, axioms). 

• The SENSUS system [Swartout et al., 1996] provides a means for 
constructing a domain-specific ontology from given common-sense 
ontologies. The basic idea is to use so-called seed elements which 
represent the most important domain concepts for identifying the 
relevant parts of a top-level ontology. The selected parts are then used as 
starting points for extending the ontology with further domain-specific 
concepts. 
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• The SKC project (Scalable Knowledge Composition) [Jannink et al., 
1998] aims to develop an algebra for systematically composing 
ontologies from already existing ones. It will offer union, intersection, 
and difference as basic operations for such an algebra. 

• Chimaera provides support for two important tasks: (1) merging multiple 
ontologies and (2) diagnosing (and evolving) ontologies [McGuinness et 
al., 2000]. We will describe this system in more detail below. 

• PROMT (formally known as SMART) is an interactive ontology- 
merging tool [Noy & Musen, 2000]. It guides the user through the 
merging process, making suggestions, determining conflicts, and 
proposing conflict-resolution strategies. PROMPT starts with the 
linguistic similarity matches of frame names for the initial comparison, 
but concentrates on finding clues based on the structure of the ontology 
and the user’s actions. After the user selects an operation to perform, 
PROMPT determines the conflicts in the merged ontology that the 
operation has caused and proposes possible solutions to the conflicts. It 
then considers the structure of the ontology around the arguments to the 
latest operations-relations among the arguments and other concepts in 
the ontology-and proposes other operations that the user should perform. 
In the PROMPT project, a set of knowledge-base operations for ontology 
merging or alignment is identified. For each operation in this set the 
following is defined: (1) the changes that PROMPT performs 
automatically; (2) the new suggestions that PROMPT presents to the 
user; and (3) the conflicts that the operation may introduce and that the 
user needs to resolve. When the user invokes an operation, PROMPT 
creates members of these three sets based on the arguments to the 
specific invocation of the operation. 

• OntoMorph [Chalupsky, 2000] is a transformation system for symbolic 
knowledge. It facilitates ontology merging and the rapid generation of 
knowledge-base translators. It combines two mechanisms to describe 
knowledge-base transformations: (1) syntactic rewriting via pattern- 
directed rewrite rules that allow the concise specification of sentence- 
level transformations based on pattern matching; and (2) semantic 
rewriting which modulates syntactic rewriting via (partial) semantic 
models and logical inference via an integrated knowledge representation 
system. The integration of these mechanisms allows transformations to 
be based on any mixture of syntactic and semantic criteria. The 
OntoMorph architecture facilitates incremental development and 
scripted replay of transformations, which is particularly important during 
merging operations. OntoMorph focusses on the transformations to 
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individual ontologies that are needed to bring two or more ontologies 
into mutual agreement. This is a small but important step in the process 
of merging ontologies. OntoMorph is able to solve several problems at 
the language level of ontology mismatches. Of course, a difference in 
expressivity between two languages is not solvable and may imply loss 
of knowledge. Solutions for ontology-level problems can also be 
formulated in OntoMorph. Because OntoMorph requires a clear and 
executable specification of the transformation, the process can be 
repeated with modified versions of the original ontologies. To 
summarize with OntoMorph it is possible to specify the transformations 
of an ontology, both at a syntactical level and semantic level, which can 
be carried out automatically. 

• OntoView [Klein et al., 2002] is a Web-based system that provides 
support for the versioning of online ontologies, which might help to 
solve some of the problems of evolving ontologies on the Web. Its main 
function is to help the user to manage changes in ontologies and to keep 
different ontology versions as much interoperable as possible. It does 
this by comparing versions of ontologies and highlighting the 
differences. It then allows the user to specify the conceptual relation 
between the different versions of concepts. It also provides a transparent 
interface to arbitrary versions of ontologies. To achieve this, the system 
maintains an internal specification of the relation between the different 
variants of ontologies: it keeps track of the metadata, the conceptual 
relations between constructs in the ontologies and the transformations 
between them. OntoView has been inspired by the Concurrent 
Versioning System (CVS), used in software development to allow 
collaborative development of source code. 

Further approaches are described in [Amann & Fundulaki, 1999] and 
[Weinstein & Birmingham, 1999]. 

Figure 16 illustrates Chimeara from the University of Stanford. 8 Chimaera 
is primarily intended as a tool for merging knowledge-base fragments. The 
process of knowledge-base merging typically involves such activities as 
resolving name conflicts and aligning the taxonomy. To this end, this tool has 
special support for finding name conflicts and for walking the user through 
the merged taxonomy, pointing out likely places for your attention. 

• Taxonomic relationships. The primary function of Chimaera is to set up 
taxonomic relationships. There are two types of subclass-of relationship: 



8 A summary of the Chimaera documentation from which we quote is available at http:// 
www-ksl-svc.stanford.edu: 59 1 5/doc/chimaera/chimaera-docs.html. 
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Fig. 16 Chimaera 

direct and indirect. The indirect subclass is derived through several direct 
subclass-of relations using transitivity of the relation. The merger allows 
you to arrange only direct taxonomic relationships and does not support 
indirect subclass-of relation. 

• Decompositions. Chimaera has special features for capturing and 
displaying three types of class decomposition. The first type is disjoint 
decomposition, where the subclasses in question are disjoint from one 
another. In exhaustive decomposition the subclasses exhaustively cover 
the subclasses of the class in question. Any subclass of the class in 
question must be a subclass of one of these classes. Finally, in partition 
the subclasses are all mutually disjoint, and exhaustively cover the 
subclasses of the class in question. A partition is therefore an exhaustive, 
disjoint decomposition. 

• Slots. Chimaera has facilities for manipulating not only the taxonomic 
relationships between classes, but also the slots (attributes) of those 
classes. Chimaera recognizes two distinct types of slots: own slots and 
template slots. Own slots are slots that specify properties of the object 
itself, as opposed to properties of instances of a class. Own slots on 
classes are just like own slots on individuals; they say things about the 
class itself. Template slots are slots defined on classes that are 
manifested on instances of those classes. For example, we might define 
the template slot number-of-legs on the class Animal. This means that all 
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subclasses of Animal will also have this template slot, and that all direct 
instances of the class Animal and all instances of all subclasses of 
Animal will have number-of-legs as an own slot. 

• Analysis modes. There are three different modes in Chimaera. In name 
mode you are presented with pairs of classes whose names are similar 
enough in some way that they might represent either the same class from 
different input knowledge bases that should be merged, or that might be 
in need of a taxonomic editing to make one a subclass of the other. The 
taxonomy traversal mode guides you through the taxonomy causing you 
to look at any class that has subclasses that came from multiple source 
knowledge bases. Such classes are likely places for the inclusion of new 
decompositions. The slot traversal mode guides you through all of the 
classes that, as a result of the merging operations, now have slots that 
came from multiple knowledge bases. Such slots might need merging. 



4.1.4 Reasoning with Ontologies: Instance and Schema Inferences 

Inference engines for ontologies can be used to reason about instances of an 
ontology or over ontology schemes. 

Reasoning over instances of an ontology, for example, deriving a certain 
value for an attribute applied to an object. Such inference service can be used 
to answer queries about explicit and implicit knowledge specified by an 
ontology. The powerful support in formulating rules, constraints and 
answering queries about schema information is far beyond existing database 
technology. These inference services are the equivalent of SQL query engines 
for databases, however provide stronger support (for example, recursive 
rules). Example systems are: CLIPS 9 (used as one output format of Protege), 
SWI Prolog, Ontobroker , 11 and Flora 12 . Alternatively, simpler RDF query 
engines 13 based on database technology can be used. They provide less 
problems in scalability, but restrict the expressive power for rules, queries, 
and the way they access the concept definitions of an ontology (e.g., they may 
ignore inheritance). 

Reasoning over concepts of an ontology, for example, automatically 
deriving the right position of a new concept in a given concept hierarchy. 
FaCT (Fast Classification of Terminologies ) 14 can be used to automatically 



9 CLIPS: A Tool for Building Expert Systems. http://www.ghgcorp.com/clips/CLIPS.html 

10 http://www.swi.psy.uva.nl/projects/SWI-Prolog 

11 http://www.ontoprise.de 

12 http://www.cs.sunysb.edu/~sbprolog/flora 

13 A survey of RDF query engines can be found at [Broekstra et al., 2000], 
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derive concept hierarchies. It is a description logic classifier that makes use of 
the well-defined semantics of OIL. 

Both types of reasoners help to build ontologies and to use them for 
advanced information access and navigation, as we will discuss below. 



4.1.5 Ontology - based Annotation Tools 

Ontologies can be used to describe large document collections. Tools need to 
help the knowledge engineer to establish large amounts of links between 
ontologies and documents via: 

• linking an ontology with a database schema or deriving a database 
schema from an ontology in the case of structured data; 

• deriving an XML DTD, an XML schema, and an RDF schema from an 
ontology in the case of semistructured data; 

• manually or semi-automatically adding Ontological annotation to 
unstructured data. 

More details can be found in [Erdmann & Studer, 2001] and [Klein et al., 

2000 ]. 



4.1.6 Using Ontologies for Information Access and Navigation 

Work with the Web is currently done at a very low level: clicking on links 
and using keyword search for links is the main (if not the only) navigation 
technique. It is like programming with assembler and go-to instructions. Such 
a low-level interface may significantly hamper the expected future growth of 
the Web. 

• Keyword-based search retrieves irrelevant information that uses a certain 
word in a different sense or it may miss information where different 
words are used to describe the desired content. Navigation is only 
supported by predefined links and does not support clustering and 
linking of pages based on semantic similarity. 

• The query responses require human browsing and reading to extract the 
relevant information from these information sources. This burdens Web 
users with an additional loss of time and seriously limits information 
retrieval by automatic agents that miss all common-sense knowledge 
required to extract such information from textual representations. 



14 http://www.cs.man.ac.uk/~horrocks/FaCT 




58 



4 Tools 



• Keyword-based document retrieval fails to integrate information spread 
over different sources. 

• Finally, current retrieval services can only retrieve information that is 
located explicitly on the Web. No further inference service is provided 
for deriving implicit information. 

Ontologies help to overcome such bottlenecks in information access. They 
support information retrieval based on the actual content of a page. They help 
to navigate the information space based on semantic concepts. They enable 
advanced query answering and information extraction services, integrating 
heterogeneous and distributed information sources enriched by inferred 
background knowledge. Ontology technology provides two major 
improvements: 

• Semantic information visualization does not group information by 
location but by content. Examples are the hyperbolic browsing interface 
of Ontoprise 15 and the page content visualization tool of 
Aidministrator 16 (see Figure 17). 

• Query answering service for semistructured information sources. 

In the following, we will discuss some of these aspects in more detail. 



4.2 Ontobroker and On2broker: Early Research Prototypes 

Ontobroker (see [Fensel et al., 1998(a)], [Decker et al., 1999]) applies 
artificial intelligence techniques to improve access to heterogeneous, 
scattered and semistructured information sources as they are presented on the 
World Wide Web or organization-wide intranets. It relies on the use of 
ontologies to annotate Web pages, formulate queries, and derive answers. 
Basically you define an ontology and use it to annotate/structure/wrap your 
Web documents, and somebody else can make use of Ontobroker’s advanced 
query and inference services to consult your knowledge. To achieve this goal, 
Ontobroker provides three interleaved languages and two tools. It provides a 
broker architecture with three core elements: a query interface for 
formulating queries; an inference engine used to derive answers; and a 
webcrawler to collect the required knowledge from the Web. It provides a 
representation language for formulating ontologies. A subset of this is used 
to formulate queries, i.e. to define the query language. An annotation 
language is offered to enable knowledge providers to enrich Web documents 



15 http://www.ontoprise.de 

16 http://www.aidministrator.nl 
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Fig. 17 Ontology-enabled information navigation 
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with Ontological information. The strength of Ontobroker is the close 
coupling of informal, semi-formal, and formal information and knowledge. 
This supports their maintenance and provides a service that can be used more 
generally for integrating knowledge-based reasoning with semi-formal 
documents. 



4.2.1 The Languages 



4.2.1.1 The Annotation Language 

Ontobroker provides an annotation language called HTML A to enable the 
annotation of HTML documents with machine-processable semantics. For 
example, the following HTML page states that the text string “Richard 
Benjamins” is the name of a researcher where the URL of his homepage is 
used as his object id. 

<htmlxbody> 

<a onto="page : Researcher ,, Xh2>Welcome to my homepage</h2> 
My name is <a onto=" [name=body] ">Richard Ben j amins</a> . 

</body></html> 

Two important design decision of HTML A were (1) to smoothly integrate 
semantic annotations into HTML and (2) to prevent the duplication of 
information. The reason for the first decision was to lower the threshold for 
using the annotation language. People who are able to write HTML can use it 
straightforwardly as a simple extension. The pages remain readable by 
standard browsers like Netscape Navigator or Microsoft Explorer, and 
information providers can still rely on standard Web techniques. The 
rationale underlying the second decision is more fundamental in nature. We 
did not wish to add additional data, instead we wished to make explicit the 
semantics of already available data. The same piece of data (i.e., “Richard 
Benjamins 41 ) that is rendered by a browser is given a semantics saying that 
this ASCII string provides the name of a researcher. This is a significant 
difference between our approach and those of SHOE 17 [Luke et al. 1997], 
RDF [Lassila & Swick, 1999] and annotations used in information retrieval. 

In Ontobroker, a frame-based approach has been chosen for the annotation 
language corresponding to the kind of language used for representing the 
ontology. Three primitives are provided to annotate Web documents: 

• An object can be defined as an instance of a certain class. 

• The value of an object’s attribute can be set. 

• A relationship between two or more objects may be established. 




4.2 Ontobroker and On2broker: Early Research Prototypes 61 



Location: | http : /Mww iiia cs ic . esf- richa nd^ 




Richard Benjamins 



Artificial [ntelli^ence Research Institute (III A) 

- CSIC . Barcelona, Spain 

and 

Dept, of Social Science Informatics (SWh - 
UvA . Amsterdam, the Netherlands 



Research 

* Interests 

* Activities (workshops, conferences) 

* Projects 

* Previous institutes: 

LSI . University of Sao Paulo 
IASI, LR1, University of Paris-Sud 
SWT University of Amsterdam {home 
institute, 

* Curriculum vita 
Publications 

* Recent work (online) 



<HTML> 

<HEAD> 

<a onto="page:Researcher"> </a> 

<TITLE> Richard Benjamins </TITLE> 

</HEAD> 

<H1> <A HREF="pictures/id-rich.gir> 

<IMG align=middle 
SRC="pictures/richard.gif></A> 

<a onto="page[photo s href]" 
HREF="http://www.iiia.csic.es/'-richard/pictures/ri 
chard.gif ></a> 

<a onto= "page[firstName=body]"> Richard 
</a> 

<a onto-"page[lastName=body]"> Benjamins 
</a> 

</h1> <p> <A onto="page[affiliation=body]" 
HREF="#card"> 

Artificial Intelligence Research Institute (IIIA)</A> 



Fig. 18 An example of an annotated Web page 



All three primitives are expressed by using an extended version of a 
frequent HTML tag, i.e. the anchor tag <a ...> ... </a>. The anchor tag is 
usually used to define named locations on a Web page and links to other 
locations. Thus, it contains the attributes name and href to fulfill these 
purposes. For ontologically annotating a Web page Ontobroker provides 
another attribute to the syntax of the anchor tag, namely the onto attribute. 
Typically, a provider of information first defines an object as an element of a 
certain class. To express this in its HTML extension he would use the 
following line on a home page: 

<a onto= ’ "http : / /www. iiia.csic. es/~richard" : Researcher ' > 

</a> 

URLs are used as object ids. Each class could possibly be associated with 



17 SHOE (see [Luke et al., 1996], [Luke et al. 1997]) introduced the idea of using ontologies 
for annotating Web sources. There are two main differences from Ontobroker. First, the 
annotation language is not used to annotate existing information on Web pages, but to add 
additional information and annotate them. That is, in SHOE information must be repeated and 
this redundancy may cause significant maintenance problems. For example, an affiliation must 
be provided once as a text string rendered by the browser and again as annotated 
metainformation. In this respect, SHOE is close to metatags in HTML. Ontobroker uses the 
annotations to directly add semantics to textual information that is also rendered by a browser. 
A second difference is the use of inference techniques and axioms to infer additional 
knowledge. SHOE relies only on database techniques. Therefore, no further inference service 
is provided. Ontobroker uses an inference engine to answer queries. Therefore, it can make use 
of rules that provide additional information. 
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a set of attributes. Each instance of a class can define values for these 
attributes. For example, the ontology contains an attribute email for each 
object of class Researcher. If Richard Benjamins wished to provide his email 
address, he would use this line on his home page: 

<a onto= ' "http: //www. iiia . csic . es/~richard" 

[email=?mailto : richard@iiia .csic.es?] ' ></a> 

The object denoted by “http://www.iiia.csic.es/~richard” has the value 
“mailto:richard@iiia.csic.es” for the attribute email. An example of an 
annotated Web page is given in Fig. 18. 

In terms of a knowledge-based system, the annotation language provides 
the means to express factual knowledge (ground literals). Further knowledge 
is provided by the ontology. The ontology defines the terminology (i.e., 
signature) and may introduce further rules (i.e., axioms) that allow the 
derivation of additional facts that are not stated as extensions. 



4.2. 1.2 The Representation Languages 

A representation language is used to formulate an ontology. This language is 
based on Frame logic [Kifer et al., 1995]. F-logic is a language for specifying 
object-oriented databases, frame systems, and logical programs. Its main 
achievement is to integrate conceptual modeling constructs (classes, 
attributes, domain and range restrictions, inheritance, axioms) into a coherent 
logical framework. Basically it provides classes, attributes with domain and 
range definitions, is-a hierarchies with set inclusion of subclasses and 
multiple attribute inheritance, and logical axioms that can be used to further 
characterize relationships between elements of an ontology and its instances. 
The representation language introduces the terminology that is used by the 
annotation language to define the factual knowledge provided by HTML 
pages on the Web. An example is provided in Figure 19. It defines the class 
Object and its subclasses Person and Publication. Some attributes and some 
rules expressing relationships between them are defined, for example, if a 
publication has a person as an author then respectively the author should have 
this publication as one of his publications. Semantically, the language for 
defining rules is the fragment of first-order logic that can be transformed via 
Lloyd-Topor transformations [Lloyd & Topor, 1984] into Horn logic. 
Syntactically it is different as it incorporates object-oriented modeling 
primitives. Ontobroker uses a subset of F-logic for defining the ontologies: 

• Class definition: 

C[] 

defines a class with name c. 
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Class Hierarchy 
Ctjectfl 
Person Object 
Employee : Fterson 

AcademicStaff : 
Employee 

Researcher : : 
AcademicSteff 

Publication : Object 



Attribute Definitions 

Person! 

first Name =» STRING; 
lastName =» STRING; 
eMail =» STRING; 

publication =» Publication] 
Employee! 

affiliation =» Organization, 

] 

Researcher! 
research Interest =» 

Resea rchTopic, 

-l 



Rules 

FORALL P t , P 2 
P] [cooperates With - » iy 
<- 

^[cooperates With-» PJ 

FORALL P, Pub 
PubPublication 
[author -» P] 

<-> 

P Person 

[publication -» Pub] 



Key to Figure 19: 

c 1 : : c 2 means that <q is a subclass of c 2 . 
c [ a ==> r] means that an attribute a is of domain c and range r. 
o : c [ a-» v] means that o is an element of c and has the value v for a. 
<- means logical implication and <-> logical equivalence. 

Fig. 19 An excerpt from an ontology (taken from [Benjamins et al., 1999]) 



• Attribute definition: 

c [ a=» { c lr . . . , c n ) ] 

implies that the attribute a can be applied to the elements of c (it is also 
possible to define attributes applied to classes) and an attribute value 
must be member of all classes c lr ... ,c n . 

• Is-a relationship: 

Cl :: c 2 

defines cj as a subclass of c 2 which implies that: 

• all elements of q are also elements of c 2 , 

• all attributes and their value restrictions defined for c 2 are also 
defined for q, and 
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• multiple attribute inheritance exists, i.e. 

c :: c 1 [a =» { o 3 } ] and c :: c 2 [a =>> {c 4 }] imply 
c[a =» {c 3 , c 4 } ] 

• Is-element-of relationship: 

e : c 

defines e as an element of the class c. 

• Rules like 

• FORALL x,y x[a -» y ] <- y[a -» x] . 

• FORALL x,y x:c 1 [a 1 -» y] <-> y:c 2 [a 2 -» x] . 



4.2.1.3 The Query Languages 

The query language is defined as a subset of the representation language. The 
elementary expression is: 



x G c A attribute(x) = v 

Written in Frame logic as: 

x [attribute -> v] : c 

In the head of F-Logic rules, variables are all quantified. In the body, 
variables may be either all or existentially quantified. All quantified variables 
must additionally be bound by a positive atom in the body. Lloyd-Topor 
transformation handles these quantifications as follows. Existential 
quantifiers in the body may be dropped, because every variable in the body of 
a rule is implicitly existentially quantified. An all-quantification, forall y p(y\ 
in the body is transformed to 

—i exists y — i p (y) . 

Then Lloyd-Topor transformation produces a set of rules out of this. Queries 
are handled as rules without a head. Thus the above mentioned conditions for 
quantifications hold here too. 

Complex expressions can be built by combing these elementary 
expressions with the usual logical connectives (a, v, -i). The following query 
asks for all abstracts of the publications of the researcher “Richard 
Benjamins”. 

x[name -> "Richard Benjamins"; publication ->> {y [abstract 

— > z ] } ] : Researcher 

The variable substitutions for z are the desired abstracts. 
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programs 



Input 



Output 




Fig. 20 Stages and languages used in the Inference Engine. 



4.2.2 The Tools 

Ontobroker relies on two tools that give it “life”: a webcrawler and an 
inference engine. The webcrawler collects pages from the Web, extracts their 
annotations, and parses them into the internal format of Ontobroker. The 
inference engine takes these facts, together with the terminology and axioms 
of the ontology, and derives the answers to user queries. To achieve this it has 
to do a rather complex job. First, it translates Frame Logic into predicate 
logic and, second, it translates predicate logic into Horn logic via Lloyd- 
Topor transformations [Lloyd & Topor, 1984]. The translation process is 
summarized in Fig. 20. 

As a result we obtain a normal logic program. Standard techniques from 
deductive databases can be used to implement the last stage: the bottom-up 
fixpoint evaluation procedure. Because negation in the clause body is 
allowed, we have to carefully select an appropriate semantics and evaluation 
procedure. If the resulting program is stratified, Ontobroker uses simple 
stratified semantics and evaluates it with a technique called dynamic filtering 
(see [Kifer & Lozinskii, 1986], [Fensel et al., 1998(b)]), which focuses the 
inference engine on the relevant parts of a minimal model required to answer 
the query. Dynamic filtering combines bottom-up and top-down evaluation 
techniques. The top-down part restricts the set of facts which has to be 
computed to a subset of the minimal model. Thus infinite minimal models are 
also possible, because only this subset has to be finite}^ The translation of 

18 Syntactical rules that ensure that the subset of minimal model that has to be computed 
remains finite are described in [Fensel et al., 1998(b)]. 
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Ontobroker found the following: 

VI = " http: // www.aifb.uni-karlsru he. de/WBS/df e/ 
index.html " 

V2 = " http://www.aifb.imi~karlsruhe.de/WB$/dfe/ 
publications97.html#.EEF+97 tl 

V3 = "Building knowledge-based systems from 
reusable elements is a key factor in their economic 
development. However, one has to ensure that the 
assumptions and functionality of the reused building 
block fit to each other and the specific circumstances of 
the actual problem and knowledge. We use the 
Karlsruhe Interactive Verifier (KIV) for this purpose. 
We show how the verification of conceptual and formal 
specifications of knowledge-based systems can be done 
with it. KIV was originally developed for the verification 
of procedural programs but it fits well for verifying 
knowledge-based systems. Its specification language is 

based on algebraic specification means for the functional specification of components and dynamic logic 
for the algorithmic specification. It provides an interactive theorem prover integrated into a sophisticated 
tool environment supporting aspects like the automatic generation of proof obligations, generation of 
counter examples, proof management, proof reuse etc. Such a support is essential in making verification of 
complex specifications feasible. We provide some examples on how to specify and verify tasks, problem- 
solving methods, and their relationships." 

V j = ,, http://www.aifb.uni-karlsruhe.de/WBS/dfe/index.htmr 

V2 = " http://www.aifb.uni-karlsruhe.de/WBS/dfe/publications97.htm]#FS97 " 

V3 = "During the last years, a number of formal specification languages for knowledge-based systems have 
been developed. Characteristic for knowledge-based systems are a complex knowledge base and an 
inference engine which uses this knowledge to solve a given problem. Specification languages for 
knowledge-based systems have to cover both aspects: they have to provide means to specify a complex and 
large amount of knowledge and they have to provide means to specify the dynamic reasoning behaviour of 
a knowledge-based system. This paper will focus on the second aspect, which is an issue considered to be 
unsolved. For this purpose, we have surveyed existing approaches in related areas of research. We have 
taken approaches for the specification of information systems (i.e., Language for Conceptual Modelling and 
Troll), approaches for the specification of database updates and the dynamics of logic programs 
(Transaction Logic and Dynamic Database Logic), and the approach of Evolving Algebras. This paper, 
which is a short version of a longer report, concentrates on the methodology of our comparison and on the... 

Fig. 21 The tabular query interface 

Frame logic usually results in a logic program with only a limited number of 
predicates, so the resulting program is often not stratified. In order to deal 
with non-stratified negation, Ontobroker uses the well-founded model 
semantics [Van Gelder et al., 1991] and computes this semantics with an 
extension of dynamic filtering. 

A hyperbolic presentation of the ontology and a tabular interface improve 
the accessibility of Ontobroker. Expecting a normal Web user to type queries 
in a logical language and to browse large formal definitions of ontologies is 
not very realistic. Therefore, the structure of the query language is exploited 
to provide a tabular query interface as shown in Figure 21. We also need 




Researcher with name “ FenseP ’ 
& Publications of this author 
& their abstracts 
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support for selecting classes and attributes from the ontology. To allow the 
selection of classes the ontology has to be presented in an appropriate 
manner. Usually, an ontology can be represented as a large hierarchy of 
concepts. With regard to the handling of this hierarchy a user has at least two 
requirements: First, he wants to scan the vicinity of a certain class looking for 
classes better suitable to formulate a certain query. Second, he needs an 
overview of the entire hierarchy to allow for a quick and easy navigation 
from one class in the hierarchy to another class. These requirements are met 
by a presentation scheme based on hyperbolic geometry [Lamping et al., 
1995], where classes in the center are depicted with a large circle and classes 
at the border of the surrounding circle are only marked with a small circle 
(see Figure 22). The visualization technique allows rapid navigation to 
classes far away from the center as well as a closer examination of classes 
and their vicinity. When a user selects a class from the hyperbolic ontology 
view, the class name appears in the class field of the tabular interface and the 
user can select one of the attributes from the attribute choice menu as the 
preselected class determines the possible attributes. Based on these interfaces 
Ontobroker automatically derives the query in textual form and presents the 
result of the query. 



4.2.3 On2broker 

Ontobroker was presented as a means to improve access to information 
provided on intranets and in the Internet (see [Fensel et al., 1997]). 
On2broker (see [Fensel et al., 1999(a)], [Fensel et al., 2000(a)]) is one of the 
successor systems of Ontobroker. The major new design decisions in 
On2broker are the clear separation of the query and inference engines and the 
integration of new Web standards like XML and RDF. Both decisions 
address two significant complexity problems of Ontobroker: the 

computational inference effort required for a large number of facts and the 
human annotation effort necessary for adding semantics to HTML 
documents. 

The overall architecture of On2broker, which includes four basic engines 
representing different aspects, is provided in Fig. 23. 

• The query engine receives queries and answers them by checking the 
content of the databases that were filled by the info agent and the 
inference engine. 

• The info agent is responsible for collecting factual knowledge from the 
Web using various types of meta-annotations, direct annotations like 
XML and in future also text mining techniques. 
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Fig. 22 The hyperbolic ontology interface 

• The inference engine uses facts and ontologies to derive additional 
factual knowledge that is only provided implicitly. It frees knowledge 
providers from the burden of specifying each fact explicitly. 

• The database manager is the backbone of the entire system. It receives 
facts from the info agent, exchanges facts as input and output with the 
inference engine, and provides facts to the query engine. 
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Fig. 23 On2broker architecture 



Again, ontologies are the overall structuring principle. The info agent uses 
them to extract facts, the inference engine to infer facts, the database manager 
to structure the database, and the query engine to provide help in formulating 
queries. 



4.2.3.1 The Database Manager: Decoupling Inference 
and Query Response 19 

In the worst case, a query may lead to the evaluation of the entire minimal 
model of a set of facts and rules. This is a computational hard problem (see 
[Brewka & Dix, 1999]). In other cases, predicate symbols and constants are 
used to divide the set of facts into subsets in order to omit those subsets which 
do not contribute to the answer. This normally reduces the evaluation effort 
considerably. Ontobroker allows very flexible queries such as “what 
attributes does a class have”. As a consequence, the entire knowledge is 
represented by only a few predicates, such as the predicate value which 
relates a class c to its attribute att and the corresponding attribute value v 
( value{c , att , v)). This reification strategy implies that the set of facts is only 
divided into a few subsets. Using few predicates has the consequence that 

19 For the database community On2broker is a kind of data warehouse [Jarke et al., 2002] for 
data on the Web. Queries are not run on the sources to which On2broker provides access, but 
on a database into which the source content has been extracted. In addition to the facts that can 
be found explicitly in the sources, the system also applies rules to derive additional 
information. 
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nearly every rule set is not stratified (see [Ullman, 1988]) if negation in rules 
is allowed. Therefore Ontobroker has to make use of well-founded semantics 
(see [Van Gelder et al., 1991]) because well-founded model semantics also 
allows us to evaluate non-stratified rule sets. 

Both points, the small number of predicates and the well-founded model 
semantics, lead to severe efficiency problems. Such an inference approach 
could only be applied to knowledge bases with less than 100,000 facts. 
However, it is clear that such an approach need to be applicable to millions of 
facts in order to be of practical relevance. This pointed out a serious 
shortcoming of the overall system architecture of Ontobroker. In Ontobroker, 
the query engine and the inference engine are actually one engine. The 
inference engine receives a query and derives the answer. However, an 
important decision was already made in the design of Ontobroker when the 
webcrawler and the inference engine were separated. The webcrawler 
periodically collects information from the Web and caches it. The inference 
engine uses this cache when answering queries. The decoupling of inferences 
and fact collection is done for efficiency reasons. The same strategy is used 
by search engines on the Web. A query is answered with the help of their 
indexed cache and not by starting to extract pages from the Web. On2broker 
refines the architecture of Ontobroker by introducing a second separation: 
separating the query and inference engines. The inference engine works as a 
demon in the background. It takes facts from a database, infers new facts, and 
returns these results back into the database. The query engine does not 
directly interact with the inference engine. Instead it takes facts from the 
database: 

• Whenever inference is a time-critical activity, it can be performed in the 
background independent of the time required to answer the query. 

• Using database techniques for the query interface and its underlying facts 
provides robust tools that can handle large masses of data. 

• It is relatively simple to include things like wild cards, term similarity, 
and ranking in the query answering mechanism. These can now be 
directly integrated into the SQL query interface (i.e., in part they are 
already provided by SQL) and do not require any changes to the much 
more complex inference engine. 

The strict separation of query and inference engines can be weakened for 
cases where this separation would cause disadvantages. In many cases it may 
not be necessary to enter the entire minimal model into a database. Many 
facts are of incidental or no interest when answering a query. The inference 
engine of On2broker incorporates this in its dynamic filtering strategy which 
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uses the query to focus the inference process (see [Fensel et al., 1998(b)]). 
You can make use of this strategy when deciding which facts are to be put 
into the database. Either you limit the queries that can be processed by the 
system or you replace real entries in the database with a virtual entry 
representing a query to the inference engine. The latter may necessitate a long 
delay in answering, which, however, may be acceptable for user agents which 
collect information from the Web in a background mode. Finally, you can 
cache the results of queries to speed up the process when such queries 
reoccur. In many application contexts the full flexibility of the query interface 
is not as necessary as information answering a set of predefined queries. This 
also holds for the automatic generation of documents. Here, the document 
results from a query that is executed when the document is retrieved by a 
user. Therefore, such a document corresponds to a predefined query. 

4.2.3.2 The Info Agent 

The info agent extracts factual knowledge from Web sources. We will discuss 
the four possibilities that are provided by On2broker. 

First, On2broker uses Ontobroker’ s minor extension of HTML called 
HTML a to integrate semantic annotations into HTML documents. On2broker 
uses a webcrawler to collect pages from the Web, extracts their annotations, 
and parses them into the internal format of On2broker. 

Second, you can make use of wrappers for automatically extracting 
knowledge from Web sources. Annotation is a declarative way to specify the 
semantics of information sources. A procedural method is to write a program 
(called a wrapper) that extracts factual knowledge from Web sources. 
Writing wrappers for stable information sources enable the application of 
On2broker to structured information sources that do not make use of an 
annotation language to make explicit the semantics of the information 
sources. 

Third, On2broker can make use of RDF annotations (see [Lassila & 
Swick, 1999]). The info engine of Onto2broker extracts RDF descriptions, 
and the inference engine of On2broker for RDF is called SiLRI (Simple 
Logic-based RDF Interpreter) [Decker et al., 1998]. 

The fourth interesting possibility is the increased use of XML. In many 
cases, the tags defined by a DTD may carry semantics that can be used for 
information retrieval. For example, assume a DTD that defines a person tag 
and within it a name and phone number tag. 
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<PERSON> 

<NAME>Richard Ben j amins</NAME> 

<PHONE>+3120525-62 63</PHONE> 

</PERSON> 

Then the information is directly accessible with its semantics and can be 
processed later by Ontobroker for query answering. Expressed in Frame 
Logic, we get: 

url [NAME ->> "Richard Benjamins"; PHONE ->>+3120525-6263] 

: PERSON 



4.2.4 Conclusions 

Ontobroker uses semantic information to guide the query answering process. 
It provides answers with a well-defined syntax and semantics that can be 
directly understood and further processed by automatic agents or other 
software tools. It enables a homogeneous access to information that is 
physically distributed and heterogeneously represented on the Web and it 
provides information that is not directly represented as facts on the Web, but 
which can be derived from other facts and some background knowledge. 
Still, the range of problems it can be applied to is much broader than 
information access and identification in semistructured information sources. 
On2broker is also used to create and maintain such semistructured 
information sources, i.e. it is a tool for Web site construction and 
restructuring. 

Automatic document generation extracts information from weakly 
structured text sources and creates new textual sources. Assume distributed 
publication lists of members of a research group. The publication list for the 
whole group can automatically be generated by a query to On2broker. A 
background agent periodically consults On2broker and updates this page. The 
gist of this application is that it generates semistructured information 
presentations from other semistructured ones. The results of a query to 
On2broker may be inserted as Java script data structures into the HTML 
stream of a Web page. This allows the insertion of content into a Web page 
which is dynamically generated by On2broker. 

Maintenance of weakly structured text sources helps to detect in- 
consistencies among documents and between documents and external 
sources, i.e., to detect incorrectness. Maintaining intranets of large 
organizations and companies is becoming a serious business, because such 
networks already provide several million documents. WebMaster ([van 
Harmelen & van der Meer, 1999]) has developed a constraint language for 
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formulating integrity constraints for XML documents (e.g., a publication on a 
page of a member of the group must also be included in the publication list of 
the entire group). Here the ontology is not used to derive additional facts, but 
rather to ensure that the provided knowledge is consistent and correct. 



4.3 The Ontoprise Tool Suite 

Meanwhile a company called ontoprise 20 has been set up to develop software 
products from prototypes developed at the University of Karlsruhe. All tools 
are already implemented in large customer projects and made mature for 
company-wide usage. The OntoEdit ontology engineering tool is in 
widespread use with more than 3000 installations worldwide. In this section, 
we will describe some of the tools developed there to give an example of a 
professional tool environment. 

Navigating through a (possibly unknown) portal is a rather difficult task in 
general. Information retrieval may of course help, but it may also be more of 
a hindrance, because the user may not be acquainted with the 
conceptualization that underlies the portal. Hence, query and navigating 
capabilities must be provided and the conceptual background must be made 
transparent to the user. An essential feature of a community Web portal is the 
contribution of information from all (or at least many) members of the 
community. Though they share some common understanding, the pieces of 
information they may contribute may come in many different (legacy) 
formats. Hence, one needs a set of methods and tools that may account for the 
diversity of information sources of potential interest to the community portal. 
These methods and tools must be able to cope with different syntactic 
mechanisms and they must be able to integrate different semantic formats 
based on the common ontology. 

The example that we draw from in the rest of this chapter is the (KA) 2 
portal [Benjamins et al., 1999] introduced earlier. This initiative was 
conceived for semantic knowledge retrieval from the Web, building on 
knowledge created in the community. It was built on manual semantic 
annotation for integration and retrieval of facts from semantically annotated 
Web pages, which belonged to members of the knowledge acquisition 
community. Given this basic scenario, which may be easily transferred to 
other settings for community Web portals, we describe how the tools and 
technologies support and solve the development of this ontology-based (KA) 2 

9 1 

community Web portal (see Figure 24). 
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4.3.1 Architecture 

The overall architecture and environment of an ontology-based system is 
depicted in Figure 25: The backbone of the system consists of the knowledge 
warehouse, i.e. the data repository, and the OntoBroker system, i.e. the 
principal inferencing mechanism. 

OntoBroker is an inference engine for F-logic. F-logic enables reasoning 
about the ontology itself, i.e. about classes, subclasses, their relations and 
attributes, and about instances of classes. It is the server in a client-server 
architecture and represents the run-time system in ontology-based 
applications. OntoBroker is implemented in Java and may be accessed by a 
Java API, by a capsulated socket protocol, and by a Web service interface. 
The access by a DLL library enables a smooth integration into the Microsoft 
world. On top of these basic access Web technologies like Jscript, JSP, ASP, 
PHP are supported. OntoBroker may be configured as a distributed system 
where different OntoBroker servers collaborate on different computers. A 
deep integration into databases allows very large sets of facts to be processed. 
OntoBroker comes with a large set of connectors to other applications like 
databases, index servers, Web services etc. OntoBroker is also part of 
OntoEdit which enables ontologies to be tested during development and 




Fig. 24 Screenshot of the (KA) 2 community Web portal 
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provides a smooth integration of the engineering environment with the 
runtime environment. 

OntoEdit is an ontology engineering environment. It is a application that 
provides a graphical ontology editing environment (which enables inspection, 
browsing, codification and modification of ontologies and thus supports their 
development and maintenance) and an extensible architecture for adding new 
plug-ins. Ontologies may be developed collaboratively on a set of OntoEdit 
clients. The conceptual model of an ontology is stored internally using a 
powerful ontology model, which can be mapped onto different, concrete 
representation languages. Ontologies are stored in relational databases and 
can be implemented in XML, F-logic, RDF(S), and DAML+OIL. 

OntoAnnotate is a semi-automatic annotation tool that enables the 
collection of knowledge from documents and Web pages, creating a 
document base including metadata and enriching Web resources or intranets 
with metadata. It allows to annotate not only static HTML documents, but 
also MS Word and MS Excel documents. OntoAnnotate uses OntoBroker as 
a server and thus is smoothly integrated into the OntoBroker/OntoEdit 
environment. 

In the following we give an overview of each module of the tool 
environment, starting with the ontology engineering workbench OntoEdit, 
followed by the inference engine OntoBroker and the annotation tool 
OntoAnnotate. 




Fig. 25 Tool Architecture 









Fig. 26 OntoEdit visualizer 



4.3.2 OntoEdit 

Similarly to software engineering and as proposed by [Lopez et al., 1999], the 
development of ontologies is divided into different phases: a requirements 
specification phase, a refinement phase, and an evaluation phase. 

Requirement Specification. You start ontology development by collecting 
requirements for the envisaged ontology. By nature this task is performed by 
a team of experts for the domain accompanied by experts for modeling. The 
outcome of this phase is a document that contains all relevant requirement 
specifications and a semi-formal ontology description (see Figure 26). 

Ontology Refinement. In the ontology refinement phase the semi-formal 
description of the ontology is extended and completely formalized in order to 
make it machine-processable. In this phase you can take advantage of the 
inferencing capabilities of OntoBroker built into OntoEdit. 
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Reuse in the refinement phase comes in two flavors. First, you may 
exploit existing structures like existing ontologies or thesauri, e.g. ones that 
are stored in databases, by semantically integrating them into the target 
ontology. As in general information integration [Wiederhold, 1997], this 
involves two steps. The first step concerns the mapping onto a common data 
model. For this purpose, you can take advantage of the inference engine’s 
capabilities, viz. to read in RDF(S), to connect to relational databases, to 
provide further built-ins (e.g., for connection to XML repositories). The 
outcome of this step are data in F-logic structures, but with some rather 
arbitrary semantics. In the second step, you could build rules to map the 
outcome of the first step into the desired categories. For instance, you may 
map a database table-like structure into a target structure of sub- and super- 
concepts. An example may be given with a database table that contains 
WordNet hyponyms and synonyms. Our example is based on a locally 
installed mySQL database system that contains a WordNet database. 

1 . In the first step we map this table into an equivalent predicate 
HYPONYM: 

FORALL C ,D HYPONYM (C,D) 

<- DBACCESS (hyponym, F('sub', C, ' super' , D) , 

' mySQL' wordnet' , ' localhost' ) . 

2. In the second step, we define the objects of the predicted hyponym to be 
subconcepts of Computer if it is known that one of their super-concepts is 
a subconcept of Computer: 

FORALL C,D C :: D <- HYPONYM (C,D) AND D :: Computer. 

Thus, one may easily reuse existing thesauri or database schemas in order 
to generate a large number of concepts fast. 

In addition, you may reuse axiom definitions from a library of ontology 
modules that are distinguished by a name-space mechanism. A set of axiom 
definitions specified in one domain is reusable in another domain by the 
inference engine’s capability to store and load axioms from a library to and 
into different name-spaces in a way that is reusable for another domain. 

Besides integrating axioms from a library into the ontology, one may 
apply axioms in order to enforce constraints on the ontology. We distinguish 
three major types: 

1 . Axioms of F-logic: These are an integral part of the F-logic definition. 

However, not all of them are needed for inferencing during the usage 

of the ontology. For instance, type coercion at the conceptual level: 

FORALL C , D, E , A, T E :: T <- C [A =» T] AND 
D : : C [A =» E] . 
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Specify E as a subclass of T if some concept C has an attribute A of 
type T and a subclass D of C has an attribute A with type E. 

2. Axioms for domain-specific consistency: These enforce consistency 
constraints at building time. For example, they may ensure that the 
domain-specific relation HASPHYSICALPART is without cycles: 

NONCYCLIC (HASPHYSICALPART) . 

FORALL X,R UNDEFINED <- NonCyclic (R) AND X[A -» X] . 

HASPHYSICALPART is of type NONCYCLIC. Indicate consistency 
violation if an attribute A is of type NonCyclic and X is related via A to 
itself. 

3. Axioms enforcing modeling policies: Such axioms do not add to the 
semantic description, but they are applied in order to enforce semiotic 
constraints, e.g., that no subconcept should have more than n 
subconcepts, that no hierarchy should be deeper than m, or that every 
attribute symbol should begin with a lower case letter: 

FORALL A UNDEFINED <- 

EXISTS X, Y X[A =» Y ] AND NOT regexp('~[a . z] ' , A) . 

Indicate consistency violation if there is an attribute symbol A between 
some classes X, Y and it does not match with a string beginning with 
lower-case alphabetical letters. 

The three types of axioms just described are not integrated into the 
ontology, because once the ontology is fixed and remains unchanged they are 
not violated anyway. Still, switching them off improves performance, 
because they need not be revisited and checked again. 

The definition of axioms is supported by a graphical form-based interface 
in OntoEdit (see Figure 27). 

Evaluation. The last step in ontology development is about evaluating the 
formal ontology. For this purpose, the ontology engineer may interactively 
construct and save instances and axioms into modules. OntoEdit contains a 
simple instance editor that the ontology engineer can use to create test sets. 
Another way to get test instances is to fetch them from a database. Our 
database import creates a (flat) ontology out of the relational scheme of a 
database. OntoMap is a tool that allows the user to interactively map an 
ontology to another ontology. The relationships between the mapped 
ontologies are again formally represented by F-logic axioms. In this way our 
original ontology may be populated with instances from the the database. If a 
query is now posed, SQL queries are generated to get the appropriate answers 
from the database. Figure 28 shows the GUI for the mapping of our two 
ontologies. 
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Fig. 27 Graphical axiom editor in OntoEdit 



For instance, you may create a test case to evaluate our author rule: 

FORALL Personl, Publicationl 

Publicationl : Publication [hasAuthor -» Personl] <-> 

Personl : Person [hasWritten -» Publicationl]. 

The publications and the persons have been mapped out of the database to 
our ontology. So it is clear that for each publication and for each person an 
instance exists. For each publication instance it is known which person 
instances are authors of the publication. The other way round is defined by 
our rule. Now the following query should deliver all publications of Rudi 
Studer though this direction has not been specified explicitly for Rudi Studer: 

FORALL X, P, T <- 

X : Person [ name -»" Rudi Studer" ; 

hasWritten-» P [hasTitle-»T] ] . 
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In order to locate problems, OntoEdit takes advantage of the inference 
engine OntoBroker itself, which allows for introspection and also comes with 
a debugger. Axioms are operationalized by posing queries (e.g., on the test 
cases specified as seen above). To analyze the results of possible bugs in the 
rules OntoEdit provides different tools. First, for a given query, the results 
and their dependencies on existing test instances and intermediate results may 
be examined by visualizing the proof tree. The proof tree shows graphically 
which instances or intermediate results are combined by which rules to get 
the final answers. Thus the inferences drawn may be traced back to the test 
instances and semantic errors in rules may be discovered. Second, the 
inference engine may be “observed” during evaluation. As shown in Figure 
29 a graphical presentation of the set of axioms as a graph structure indicates 
which axiom is being evaluated at the moment and also shows which 
intermediate results have already been created and thus “have flown” in the 
axiom graph to other axioms. This also gives the user a feeling for how much 
time is needed to evaluate special rules. 
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Fig. 28 Screenshot of OntoMap 
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4.3.3 OntoBroker 

In order to provide a clearly defined syntax and semantics for ontologies, the 
representation of these knowledge models is based on F-logic [Kifer et al., 
1995]. OntoBroker infers answers to F-logic queries. In the following we 
give an overview of the syntax and semantics of F-logic and the 
operationalization within the OntoBroker system. 



4.3.3. 1 Syntax 

F-logic allows us to describe ontologies, i.e. classes, the hierarchy of classes, 
their attributes and relationships between classes in an object-oriented style. 



4.3.3.2 Semantics 

F-logic rules have the expressive power of Horn logic with negation and may 
be transformed into Horn logic rules. The semantics for a set of F-logic 
statements is defined by the well-founded semantics [Van Gelder et al., 
1991]. This semantics is close to first-order semantics. In contrast to first- 
order semantics not all possible models are considered but one “most 
obvious” model is selected as the semantics of a set of rules and facts. It is a 
three valued logic, i.e. the model consists of a set of true facts, a set of 
unknown facts, and a set of facts known to be false. In contrast to the 
stratified semantics the well-founded semantics is also applicable to rules 
which depend on cycles containing negative rule bodies. Because F-logic is 
very flexible, during the translation to normal programs such negative cycles 
often arise. 




Fig. 29 Visualizing inferencing with OntoBroker in OntoEdit 
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4.3.3.3 Operationalization 

OntoBroker provides means for efficient reasoning with instances and for the 
capability to express arbitrary powerful rules, e.g. ones that quantify over the 
set of classes. The most widely published operational semantics for F-logic is 
the alternating fixpoint procedure [Van Gelder et al., 1991]. This is a forward 
chaining method which computes the entire model for the set of rules, i.e. the 
set of true and unknown facts. To answer a query the entire model must be 
computed (if possible) and the variable substitutions for the query are then 
derived. In contrast, the inference engine OntoBroker performs a mixture of 
forward and backward chaining based on the dynamic filtering algorithm 
[Kifer & Lozinskii, 1986] to compute the (smallest possible) subset of the 
model for answering the query. In most cases this is much more efficient than 
the simple evaluation strategy. These techniques stem from the deductive 
database community and are optimized to deliver all answers instead of one 
single answer as, for example, resolution does. We have shown this for test 
cases where all paths in large graphs are computed. The results are shown in 
Figure 30. We measured the time in milliseconds OntoBroker (versions 3.1 
and 3.2) and SiLRI (the academic prototype that implements the RDF 
inference engine of On2broker) need for computing all paths of a certain 
number of graphs given. In contrast to SiLRI OntoBroker has almost a linear 
growth of time, OntoBroker therefore scales up for such scenarios. 
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Fig. 30 Comparison of OntoBroker and SiLRI 
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Fig. 31 Screenshot of OntoAnnotate 



4.3.4 OntoAnnotate: Meta-Information for Documents 

We have seen how one can map structured data sources like relational 
databases via OntoEdit on a knowledge model and make it processable via 
OntoBroker. 

To disambiguate the meaning of distinct items of information and 
statements in unstructured documents and make them accessible and 
interpretable for information systems, these items of information have to be 
mapped to the knowledge model as well. Annotation is the means to bring 
metadata into documents to make the meaning of information explicit. 
OntoAnnotate is a tool framework to create this kind of relational metadata 
(see Figure 31). It allows the quick annotation of facts within any document 
by tagging parts of the text and defining its meaning by mapping the text to 
the appropriate concept in the knowledge model by drag and drop. 
OntoAnnotate enables the annotation of concepts, their attributes and 
relations by using the drag and drop methods of the tool. Given an ontology, 
the annotation process usually starts by tagging one or more phrases in the 
document. The user marks up the desired word or part of the text in the 
document, drags the appropriate concept in the ontology and drops it on the 
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marked text. OntoAnnotate embeds the corresponding metadata around the 
text in the document. After highlighting a part of the text in the document, the 
user may select a concept, the appropriate existing instance or a new one, and 
an attribute or a relation. Just by making the desired choices from the concept 
tree, instance or relational view, the user can quickly annotate the document. 
With the features of our annotation tool, we support annotators with an 
interactive graphical means, helping to avoid syntax errors. We support them 
in choosing the most appropriate concepts for instances and provide an object 
repository to identify existing instances. 



4.4 On-To-Knowledge: Evolving Ontologies 
for Knowledge Management 

On-To-Knowledge 22 (see [Fensel et al., 2002(c)], [Davies et al., 2003]) was a 
project in the 5th European Information Society Technologies (1ST) 
Framework program. It provides improved information access in digital 
networks. The goal of the On-To-Knowledge project is to support efficient 
and effective knowledge management. It focuses on acquiring, maintaining, 
and accessing weakly structured on-line information sources: 

• Acquiring: Text mining and extraction techniques are applied to extract 
semantic information from textual material (i.e., to acquire information). 

• Maintaining : RDF and XML are used for describing the syntax and 
semantics of semistructured information sources. Tool support enables 
automatic maintenance and view definitions of this knowledge. 

• Accessing: Push services and agent technology support users in 
accessing this knowledge. 

For all tasks, ontologies are the key asset in achieving the functionality 
described. Ontologies are used to annotate unstructured information with 
structural and semantic information. Ontologies are used to integrate 
information from various sources and to formulate constraints on their 
content. Finally, ontologies help to improve user access to this information. 
Users can define their own personalized view, their user profile, and their 
information agents in terms of an ontology. On-To-Knowledge develops a 
three-layered architecture for information access. At the lowest level (the 
information level), weakly structured information sources are processed to 
extract machine-processable meta-information from them. The intermediate 
level (the representation level) uses this meta-information to provide 
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Fig. 32 The architecture of On-To-Knowledge 



automatic access, creation, and maintenance of these information sources. 
The highest level (the access level) uses agent-based techniques as well as 
state-of-the-art querying and visualization techniques that fully employ 
formal annotations to guide user access of information. 

A key deliverable of the On-To-Knowledge project is the resulting 
software toolset. Several consortium partners are participating in the effort to 
realize in software the underpinning ideas and theoretical foundations of the 
project. A major objective of the project is to create intelligent software to 
support users both in accessing information and in the maintenance, 
conversion, and acquisition of information sources. Most of the tools 
presented in Figure 32 are described below. 

QuizRDF (see [Davies et al., 2002(a)]) is an ontology-based tool for 
knowledge discovery which combines traditional keyword querying of 
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WWW resources with the ability to browse and query against RDF 
annotations of those resources. RDFS and RDF are used to specify and 
populate an ontology and the resultant RDF annotations are then indexed 
along with the full text of the annotated resources. The index allows keyword 
querying both against the full text of the document and against the literal 
values occurring in the RDF annotations, along with the ability to browse and 
query the ontology. This ability to combine searching and browsing 
behaviors more fully supports a typical information-seeking task than 
“traditional” search engine technology. The approach is characterized as “low 
threshold, high ceiling” in the sense that where RDF annotations exist they 
are exploited for an improved information-seeking experience but where they 
do not yet exist, a search capability is still available. OntoShare (see [Davies 
et al., 2002(b)]) enables the storage of best practice information in an 
ontology and the automatic dissemination of new best practice information to 
relevant co-workers. It also allows users to browse or search the ontology in 
order to find the most relevant information to the problem that they are 
dealing with at any given time. The ontology helps to orientate new users and 
acts as a store for key learning and best practices accumulated through 
experience. Spectacle organizes the presentation of information. This 
presentation is ontology-driven. Ontological information, such as classes or 
specific attributes of information, is used to generate exploration contexts for 
users. An exploration context makes it easier for users to explore a domain. 
The context is related to certain tasks, such as finding information or buying 
products. The context consists of three modules: 

• content: specific content needed to perform a task; 

• navigation: suitable navigation disclosing the information; 

• design: applicable design displaying the selected content. 

OntoEdit [Sure et al., 2002] makes it possible to inspect, browse, codify and 
modify ontologies, and thus serves to support the ontology development and 
maintenance task. Modelling ontologies using OntoEdit involves modelling 
at a conceptual level, viz. (i) as independently of a concrete representation 
language as possible, and (ii) using GUIs representing views on conceptual 
structures (concepts, concept hierarchy, relations, axioms) rather than 
codifying conceptual structures in ASCII. 

The Ontology Middleware Module (OMM) can be seen as the key 
integration component in the On-To-Knowledge technical solution 
architecture. It supports well-defined application programming interfaces 
(OMM API) used for access to knowledge and deals with such matters as: 
ontology versioning, including branching; security (user profiles and groups 




4.4 On-To-Knowledge: Evolving Ontologies for Knowledge Management 87 



are used to control the rights for access, modifications, and publishing); meta- 
information and ontology lookup (support for meta-properties for whole 
ontologies, as well as for separate concepts and properties); access via a 
number of protocols. 

From a functional point of view, OMM supports the two major scenarios 
of usage of the On-To-Knowledge tools as follows: 

• Knowledge engineering - the versioning system makes OMM/Sesame 
an ideal environment for collaborative knowledge engineering, enabling 
a development style similar to that of the source control systems (CVS) 
provide for software development. The Sesame/OMM plug-in for 
OntoEdit allows multiple knowledge engineers to use the editor as a 
front-end, downloading the latest version of the ontology from OMM 
and uploading their contributions. In order for this scenario to work, 
OMM silently does smart merging as well as tracking of the changes 
introduced. At each moment, the updates can be listed and older versions 
can be retrieved. 

• Knowledge use - the access control sub-system of OMM makes possible 
the definition of fine-grained security policies which can capture fairly 
complex business logic. This unique feature, combined with easy 
integration (because of the multi-protocol support for the API), makes 
OMM an ideal back-end for enterprise knowledge management 
applications. 

• Following the spirit of the On-To-Knowledge toolset, OMM integrates 
tightly only with the Sesame repository and the OntoEdit editor, but 
provides guaranteed interoperability with the rest of the tools. 

Sesame is a system that allows persistent storage of RDF data and schema 
information and subsequent online querying of that information. Sesame has 
been designed with scalability, portability and extensibility in mind. One of 
the most prominent modules of Sesame is its query engine. This query engine 
supports a query language called RQL. RQL supports querying of both RDF 
data (e.g. instances) and schema information (e.g. class hierarchies, domains 
and ranges of properties). RQL also supports path-expressions through RDF 
graphs, and can combine data and schema information in one query. BOR 
provides additional reasoning services so as to extend the functionality 
provided by Sesame. Most of the classical reasoning tasks for description 
logics are available, including realization and retrieval. The goal was to 
enable even wider set of applications, such as information extraction and 
automatic ontology integration. A strategy called pre-reasoning was used to 
implement workarounds for a number of logical problems proven to be 
computationally intractable for languages as expressive as OIL. 
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Information extraction and ontology generation are performed by means 
of the CORPORUM toolset (OntoWrapper and OntoExtract) and are 
situated in the extraction layer. CORPORUM has two related, though 
different, tasks: interpretation of natural language texts and extraction of 
specific information from free text. Whereas CORPORUM tools can perform 
the former process autonomously, the latter task requires a user who defines 
business rules for extracting information from tables, (phone) directories, 
homepages, etc. Although this task is not without its challenges, most effort 
focuses on the former task, which involves natural language interpretation on 
a syntactic and lexical level, as well as interpretation of the results of that 
level (discourse analysis, co-reference and collocation analysis, etc.). 

The tool environment has been developed by the companies 
Aldministrator 23 BT Labs ,, 24 Ontotext 25 and CognIT ? 6 It is embedded in a 
methodology that provides guidelines for introducing knowledge 
management concepts and tools into enterprises, helping knowledge 
providers to present their knowledge efficiently and effectively. 



23 http://www.aidministrator.nl 

24 http ://www.bt. com/innovation/exhibition/knowledge_management 

25 http://www.ontotext.com 

26 http://www.cognit.no 




5 Applications 



A technology can only be justified by successful applications. Therefore, 
there is a clear need to talk about the interesting application areas of ontology 
technology. However, the fast iteration of marketing waves makes it a quite 
hard to see the real and stable ground. The need for ontologies arises from 
(electronic) information sharing and reuse. Therefore, we will take the 
triangle of intranet, Internet, and extranet (see Fig. 33) as our organizing 
metaphor when talking about application areas. Not all of them are typical of 
just one network type, however, it helps to reduce the chaos of the overall 
picture. Let us first characterize the three different types of networks: 

• Intranet: closed user community, company- or organization-wide use. 

• Internet: open access; worldwide user community. 

• Extranet: limited access from the outside (Internet) to an intranet. 

Here we classify the following application areas: 

• Knowledge Management in a technical sense is about the integration of 
heterogeneous, distributed and mostly semistructured information 
sources. 

• Web Commerce is about advanced end-consumer e-commerce 
(business-to-consumer, or B2C). 
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Fig. 33 One technology, various application areas 




